Seria possível ao invés de ser vários scripts sepa...

Criado em: 25 de maio de 2025

Respondido usando O3 por Chat01

Pergunta

Seria possível ao invés de ser vários scripts separados, ser apenas um script?

  • Mande o código completo atualizado no formato (markdown)
text
# ============================== # INÍCIO DO CONTEÚDO DE: Markdown-Paper2Code\analysis_process.py # ============================== import argparse import copy import json import openai from analysis_prompts import system_prompt, detail_analysis_prompt, long_term_memory_prompt def parse_analysis_option(): arg_parser = argparse.ArgumentParser("Paper2Code-Analysis Process") arg_parser.add_argument("--plan_json", type=str, default="./outputs/plan_result.json", help="LLM planned json-formatted file") arg_parser.add_argument("--base_url", type=str, default="http://localhost:3000/v1", help='LLM API URL Endpoint') arg_parser.add_argument("--model", type=str, default="QwQ-32B", help="The model name of local LLM") arg_parser.add_argument("--api_key", type=str, default="EMPTY", help='API key of local LLM') arg_parser.add_argument("--analysis_json", type=str, default="./outputs/analysis_result.json", help="detailed LLM analyzed json-formatted file") args = arg_parser.parse_args() return args def analysis_process(args: argparse.Namespace): # load plan json file with open(args.plan_json, "r", encoding='utf-8') as f: plan_json = json.load(f) markdown_paper = plan_json['paper_content'] overall_plan = plan_json['overall_plan'] architecture_plan = json.dumps(plan_json['architecture_plan']).encode('utf-8') logic_plan = json.dumps(plan_json['logic_plan']).encode('utf-8') yaml_plan = plan_json['configuration_plan'] logic_analysis_list = plan_json['logic_plan']['Logic Analysis'] long_term_memory = long_term_memory_prompt.format(markdown_paper=markdown_paper, overall_plan=overall_plan, architecture_plan=architecture_plan, logic_plan=logic_plan, config_yaml=yaml_plan) task_prompts = [[todo_file_name, copy.deepcopy(detail_analysis_prompt).format(todo_file_name=todo_file_name, todo_file_desc=todo_file_desc)] for todo_file_name, todo_file_desc in logic_analysis_list] result_dict = {'long_term_memory': long_term_memory} messages = [{'role': 'system', 'content': system_prompt + "\n\n-----\n\n" + long_term_memory}] openai_server = openai.OpenAI(base_url=args.base_url, api_key=args.api_key, timeout=3000) for item in task_prompts: print(f"Current Analysing File: {item[0]}") messages.append({'role': 'user', 'content': item[1]}) result = openai_server.chat.completions.create(model=args.model, messages=messages, temperature=0.95, timeout=3000) response = result.choices[0].message.content.strip() if "</think>" in response: response = response.split("</think>")[-1].strip() result_dict[item[0]] = response messages.append({'role': 'assistant', 'content': response}) return result_dict if __name__ == '__main__': args = parse_analysis_option() result_dict = analysis_process(args) with open(args.analysis_json, 'w', encoding='utf-8') as f: json.dump(result_dict, f) # ------------------------------ # FIM DO CONTEÚDO DE: Markdown-Paper2Code\analysis_process.py # ------------------------------ # ============================== # INÍCIO DO CONTEÚDO DE: Markdown-Paper2Code\analysis_prompts.py # ============================== system_prompt = """ **Role**: You are an expert researcher, strategic analyzer and software engineer with a deep understanding of experimental design and reproducibility in scientific research. You will receive a research paper in JSON format, an overview of the plan, a design in JSON format consisting of "Implementation approach", "File list", "Data structures and interfaces", and "Program call flow", followed by a task in JSON format that includes "Required packages", "Required other language third-party packages", "Logic Analysis", and "Task list", along with a configuration file named "config.yaml". **Task**: Your task is to conduct a comprehensive logic analysis to accurately reproduce the experiments and methodologies described in the research paper. This analysis must align precisely with the paper’s methodology, experimental setup, and evaluation criteria. 1. Align with the Paper: Your analysis must strictly follow the methods, datasets, model configurations, hyperparameters, and experimental setups described in the paper. 2. Be Clear and Structured: Present your analysis in a logical, well-organized, and actionable format that is easy to follow and implement. 3. Prioritize Efficiency: Optimize the analysis for clarity and practical implementation while ensuring fidelity to the original experiments. 4. Follow design: YOU MUST FOLLOW "Data structures and interfaces". DONT CHANGE ANY DESIGN. Do not use public member functions that do not exist in your design. 5. REFER TO CONFIGURATION: Always reference settings from the config.yaml file. Do not invent or assume any values—only use configurations explicitly provided. """ detail_analysis_prompt = """ ## Instruction **Objective**: Write the logic analysis in '{todo_file_name}', which is intended for '{todo_file_desc}'. **Constraints**: * Conduct a Logic Analysis to assist in writing the code; * Based on the paper, plan, design, task and the previously specified configuration file (config.yaml), which are in your long-term memory; * You DON'T need to provide the actual code yet; focus on a thorough, clear analysis; * You only need to analyze the specific objective of the task, focused on the file '{todo_file_name}'; * Your result is markdown formatted like output format. ## Output Format ```markdown Your detailed logic analysis on the file {todo_file_name} ...

"""

long_term_memory_prompt = """

Long-term memory

Paper Content:

markdown
{markdown_paper}

Overview of the plan:

markdown
{overall_plan}

Design:

json
{architecture_plan}

Task:

json
{logic_plan}

Configuration file:

yaml
{config_yaml}

"""

------------------------------

FIM DO CONTEÚDO DE: Markdown-Paper2Code\analysis_prompts.py

------------------------------

==============================

INÍCIO DO CONTEÚDO DE: Markdown-Paper2Code\coding_process.py

==============================

import argparse
import copy
import json
import os
import openai

from utils import extract_code_from_python
from coding_prompts import detail_coding_prompt, system_prompt

def parse_analysis_option():
arg_parser = argparse.ArgumentParser("Paper2Code-Coding Process")
arg_parser.add_argument("--analysis_json", type=str, default="./outputs/analysis_result.json", help="Input json-formatted plan file")
arg_parser.add_argument("--base_url", type=str, default="https://api.deepseek.com/v1", help='LLM API URL Endpoint')
arg_parser.add_argument("--model", type=str, default="deepseek-chat", help="The model name of local LLM")
arg_parser.add_argument("--api_key", type=str, default="", help='API key of local LLM')
arg_parser.add_argument('--save_dir', type=str, default="./outputs", help='Directory to save generated code')
arg_parser.add_argument('--coding_json', type=str, default="./outputs/coding_result.json", help="The LLM generated code for the paper")
args = arg_parser.parse_args()
return args

def coding_process(args: argparse.Namespace):
# load plan json file
with open(args.analysis_json, "r", encoding='utf-8') as f:
analysis_json = json.load(f)
file_list = [key for key in analysis_json.keys() if key.endswith(".py")]
long_term_memory = analysis_json['long_term_memory']
messages = [{'role': 'system', 'content': system_prompt + "\n\n-----\n\n" + long_term_memory}]
finished_code_files = "## Finished Code Files\n"
finished_code_file_list = []
result_dict = {}
openai_server = openai.OpenAI(base_url=args.base_url, api_key=args.api_key, timeout=3000)
for idx, todo_file_name in enumerate(file_list):
print(f"Current Code File: {todo_file_name}")
task_prompt = copy.deepcopy(detail_coding_prompt).format(todo_file_name=todo_file_name,
detailed_logic_analysis=analysis_json[todo_file_name],
done_file_list=str(finished_code_file_list))
if idx == 0:
messages.append({'role': 'user', 'content': task_prompt})
else:
messages[-1]['content'] = finished_code_files + "\n\n-----\n\n" +task_prompt
result = openai_server.chat.completions.create(model=args.model,
messages=messages,
temperature=0.95,
timeout=3000)
# extract code from response
response = result.choices[0].message.content.strip()
if "</think>" in response:
response = response.split("</think>")[-1].strip()
finished_code_files += f"{todo_file_name}:\n" + response + "\n"
result_dict[todo_file_name] = extract_code_from_python(response)

text
# save generated code to file if "/" in todo_file_name: todo_file_subdir = os.path.join(args.save_dir, os.path.dirname(todo_file_name)) if not os.path.exists(todo_file_subdir): os.makedirs(todo_file_subdir) with open(os.path.join(args.save_dir, todo_file_name), 'w') as f: f.write(result_dict[todo_file_name]) finished_code_file_list.append(todo_file_name) return result_dict

def print_coding_process_prompts(args: argparse.Namespace):
# Considering the API Cost, this function is helpful to chat with in the Web endpoint from Deepseek, Tongyi, GPT4 and Genimi. '**********' is the split symbols of different prompt parts.
with open(args.analysis_json, "r", encoding='utf-8') as f:
analysis_json = json.load(f)

text
file_list = [key for key in analysis_json.keys() if key.endswith(".py")] long_term_memory = analysis_json['long_term_memory'] finished_code_file_list = [] print(file_list) print("**********\nSystem Prompt:\n" + system_prompt + "\n\n-----\n\n" + long_term_memory) for idx, todo_file_name in enumerate(file_list): print("**********\n") print(f"Current Code File: {todo_file_name}") task_prompt = copy.deepcopy(detail_coding_prompt).format(todo_file_name=todo_file_name, detailed_logic_analysis=analysis_json[todo_file_name], done_file_list=str(finished_code_file_list)) print(f"""```markdown\n{task_prompt}\n```\n""") finished_code_file_list.append(todo_file_name)

if name == "main":
args = parse_analysis_option()
result_dict = coding_process(args)
with open(args.coding_json, 'w', encoding='utf-8') as f:
json.dump(result_dict, f)

------------------------------

FIM DO CONTEÚDO DE: Markdown-Paper2Code\coding_process.py

------------------------------

==============================

INÍCIO DO CONTEÚDO DE: Markdown-Paper2Code\coding_prompts.py

==============================

system_prompt = """
Role: You are an expert researcher and software engineer with a deep understanding of experimental design and reproducibility in scientific research.
You will receive a research paper in JSON format, an overview of the plan, a Design in JSON format consisting of "Implementation approach", "File list", "Data structures and interfaces", and "Program call flow", followed by a Task in JSON format that includes "Required packages", "Required other language third-party packages", "Logic Analysis", and "Task list", along with a configuration file named "config.yaml".

Task: Your task is to write code to reproduce the experiments and methodologies described in the paper.

  • The code you write must be elegant, modular, and maintainable, adhering to Google-style guidelines.
  • The code must strictly align with the paper's methodology, experimental setup, and evaluation metrics.
  • Write code with triple quoto.
    """

detail_coding_prompt = """

Format example

python
the code of the file '{todo_file_name}'

Instruction

Objective: Based on the paper, plan, design, task and the previously specified configuration file (config.yaml), which are in your long-term memory, , follow "Format example", write the code. We have {done_file_list}. Next you must write only the code of the file '{todo_file_name}' in Python.

constraints:

  1. Only One file: do your best to implement THIS ONLY ONE FILE.
  2. COMPLETE CODE: Your code will be part of the entire project, so please implement complete, reliable, reusable code snippets.
  3. Set default value: If there is any setting, ALWAYS SET A DEFAULT VALUE, ALWAYS USE STRONG TYPE AND EXPLICIT VARIABLE. AVOID circular import.
  4. Follow design: YOU MUST FOLLOW "Data structures and interfaces". DONT CHANGE ANY DESIGN. Do not use public member functions that do not exist in your design.
  5. CAREFULLY CHECK THAT YOU DONT MISS ANY NECESSARY CLASS/FUNCTION IN THIS FILE.
  6. Before using a external variable/module, make sure you import it first.
  7. Write out EVERY CODE DETAIL, DON'T LEAVE TODO.
  8. REFER TO CONFIGURATION: you must use configuration from "config.yaml". DO NOT FABRICATE any configuration values.

Detailed Logic Analysis Assistance:
The Detailed logic analysis to assist you in writing the code of the file '{todo_file_name}':
{detailed_logic_analysis}
"""

------------------------------

FIM DO CONTEÚDO DE: Markdown-Paper2Code\coding_prompts.py

------------------------------

==============================

INÍCIO DO CONTEÚDO DE: Markdown-Paper2Code\plan_process.py

==============================

import argparse
import json
import openai

from utils import extract_data_from_json, extract_data_from_yaml
from plan_prompts import system_prompt, overall_plan_prompt, architecture_plan_prompt, logic_plan_prompt, configuration_plan_prompt

def parse_plan_option():
arg_parser = argparse.ArgumentParser("Paper2Code-Plan Process")
arg_parser.add_argument("--paper_markdown", type=str, default="./examples/example_paper.md", help="Input markdown-formatted paper")
arg_parser.add_argument("--base_url", type=str, default="http://localhost:3000/v1", help='LLM API URL Endpoint')
arg_parser.add_argument("--model", type=str, default="QwQ-32B", help="The model name of local LLM")
arg_parser.add_argument("--api_key", type=str, default="EMPTY", help='API key of local LLM')
arg_parser.add_argument("--plan_json", type=str, default="./outputs/plan_result.json", help="LLM planned json-formatted file")
arg_parser.add_argument("--config_yaml", type=str, default="./outputs/config.yaml", help="yaml-formatted config file for paper reproduction")
args = arg_parser.parse_args()
return args

def plan_process(args: argparse.Namespace):
# init messages and long-term memory
messages = [{'role': 'system', 'content': system_prompt}]
long_term_memory = "## Long-term memory\n\n"

text
# load the paper content with open(args.paper_markdown, "r", encoding='utf-8') as f: paper_markdown = f.read() result_dict = {'paper_content': paper_markdown} openai_server = openai.OpenAI(base_url=args.base_url, api_key=args.api_key, timeout=3000) # Overall plan process print("Process: Overall Plan\tResult Type: Markdown") long_term_memory += "**Paper Content**:\n```markdown\n" + paper_markdown + """\n```\n\n""" messages.append({'role': 'user', 'content': long_term_memory + "-----\n\n" + overall_plan_prompt}) result = openai_server.chat.completions.create(model=args.model, messages=messages, temperature=0.95, timeout=3000) overall_plan_response = result.choices[0].message.content.strip() if "</think>" in overall_plan_response: overall_plan_response = overall_plan_response.split("</think>")[-1].strip() result_dict['overall_plan'] = overall_plan_response print("-----") # Architecture plan process print("Process: Architecture Plan\tResult Type: Json") long_term_memory += "**Overall Plan**:\n```markdown\n" + overall_plan_response + "\n```\n\n" messages[-1]['content'] = long_term_memory + "-----\n\n" + architecture_plan_prompt result = openai_server.chat.completions.create(model=args.model, messages=messages, temperature=0.95, timeout=3000) architecture_plan_response = result.choices[0].message.content.strip() if "</think>" in architecture_plan_response: architecture_plan_response = architecture_plan_response.split("</think>")[-1].strip() result_dict['overall_plan'] = overall_plan_response result_dict['architecture_plan'] = extract_data_from_json(architecture_plan_response) long_term_memory += "**Architecture Design**:\n```markdown\n" + architecture_plan_response + "\n```\n\n" print(f"Keys: {result_dict['architecture_plan'].keys()}\n-----") # Logic plan process print("Process: Logic Plan\tResult Type: Json") messages[-1]['content'] = long_term_memory + "-----\n\n" + logic_plan_prompt result = openai_server.chat.completions.create(model=args.model, messages=messages, temperature=0.95, timeout=3000) logic_plan_response = result.choices[0].message.content.strip() if "</think>" in logic_plan_response: logic_plan_response = logic_plan_response.split("</think>")[-1].strip() result_dict['overall_plan'] = overall_plan_response result_dict['logic_plan'] = extract_data_from_json(logic_plan_response) long_term_memory += "**Logic Design**:\n```markdown\n" + logic_plan_response + "\n```\n\n" print(f"Keys: {result_dict['logic_plan'].keys()}\n-----") # Configure plan process print("Process: Configuration Plan\tResult Type: Yaml") messages[-1]['content'] = long_term_memory + "-----\n\n" + configuration_plan_prompt result = openai_server.chat.completions.create(model=args.model, messages=messages, temperature=0.95, timeout=3000) configuration_plan_response = result.choices[0].message.content.strip() if "</think>" in configuration_plan_response: configuration_plan_response = configuration_plan_response.split("</think>")[-1].strip() result_dict['configuration_plan'] = extract_data_from_yaml(configuration_plan_response) with open(args.config_yaml, 'w', encoding='utf-8') as f: f.write(result_dict['configuration_plan']) print("-----") return result_dict

if name == 'main':
args = parse_plan_option()
result_dict = plan_process(args)

text
with open(args.plan_json, 'w', encoding='utf-8') as f: json.dump(result_dict, f)

------------------------------

FIM DO CONTEÚDO DE: Markdown-Paper2Code\plan_process.py

------------------------------

==============================

INÍCIO DO CONTEÚDO DE: Markdown-Paper2Code\plan_prompts.py

==============================

system_prompt = """
Role: You are an expert researcher and strategic planner with a deep understanding of experimental design and reproducibility in scientific research.
You will receive a research paper in MARKDOWN format.
Your task is to create a detailed and efficient plan to reproduce the experiments and methodologies described in the paper.
This plan should align precisely with the paper's methodology, experimental setup, and evaluation metrics.

Instructions:

  1. Align with the Paper: Your plan must strictly follow the methods, datasets, model configurations, hyperparameters, and experimental setups described in the paper.
  2. Be Clear and Structured: Present the plan in a well-organized and easy-to-follow format, breaking it down into actionable steps.
  3. Prioritize Efficiency: Optimize the plan for clarity and practical implementation while ensuring fidelity to the original experiments.
    """

overall_plan_prompt = """

Task

  1. We want to reproduce the method described in the attached paper.
  2. The authors did not release any official code, so we have to plan our own implementation.
  3. Before writing any Python code, please outline a comprehensive plan that covers:
    • Key details from the paper's Methodology.
    • Important aspects of Experiments, including dataset requirements, experimental settings, hyperparameters, or evaluation metrics.
  4. The plan should be as detailed and informative as possible to help us write the final code later.

Requirements

  • You don't need to provide the actual code yet; focus on a thorough, clear strategy.
  • If something is unclear from the paper, mention it explicitly.

Instruction

The response should give us a strong roadmap, making it easier to write the code later.
"""

architecture_plan_prompt = """

Task

Your goal is to create a concise, usable, and complete software system design for reproducing the paper's method. Use appropriate open-source libraries and keep the overall architecture simple.

Based on the plan for reproducing the paper’s main method, please design a concise, usable, and complete software system.
Keep the architecture simple and make effective use of open-source libraries.


Format Example

json
{"Implementation approach": "We will ...", "File list": ["main.py", "dataset_loader.py", "model.py", "trainer.py", "evaluation.py"], "Data structures and interfaces": "\nclassDiagram\n class Main {\n +__init__()\n +run_experiment()\n }\n class DatasetLoader {\n +__init__(config: dict)\n +load_data() -> Any\n }\n class Model {\n +__init__(params: dict)\n +forward(x: Tensor) -> Tensor\n }\n class Trainer {\n +__init__(model: Model, data: Any)\n +train() -> None\n }\n class Evaluation {\n +__init__(model: Model, data: Any)\n +evaluate() -> dict\n }\n Main --> DatasetLoader\n Main --> Trainer\n Main --> Evaluation\n Trainer --> Model\n","Program call flow": "\nsequenceDiagram\n participant M as Main\n participant DL as DatasetLoader\n participant MD as Model\n participant TR as Trainer\n participant EV as Evaluation\n M->>DL: load_data()\n DL-->>M: return dataset\n M->>MD: initialize model()\n M->>TR: train(model, dataset)\n TR->>MD: forward(x)\n MD-->>TR: predictions\n TR-->>M: training complete\n M->>EV: evaluate(model, dataset)\n EV->>MD: forward(x)\n MD-->>EV: predictions\n EV-->>M: metrics\n","Anything UNCLEAR": "Need clarification on the exact dataset format and any specialized hyperparameters."}

Nodes: "<node>: <type> # <instruction>"

  • Implementation approach: <class 'str'> # Summarize the chosen solution strategy.
  • File list: typing.List[str] # Only need relative paths. ALWAYS write a main.py or app.py here.
  • Data structures and interfaces: typing.Optional[str] # Use mermaid classDiagram code syntax, including classes, method(init etc.) and functions with type annotations, CLEARLY MARK the RELATIONSHIPS between classes, and comply with PEP8 standards. The data structures SHOULD BE VERY DETAILED and the API should be comprehensive with a complete design.
  • Program call flow: typing.Optional[str] # Use sequenceDiagram code syntax, COMPLETE and VERY DETAILED, using CLASSES AND API DEFINED ABOVE accurately, covering the CRUD AND INIT of each object, SYNTAX MUST BE CORRECT.
  • Anything UNCLEAR: <class 'str'> # Mention ambiguities and ask for clarifications.

Constraint

Format: output wrapped in json format, ensure the result can be parsed by json module correctly like the format example, nothing else.

Action

Follow the instructions for the nodes, generate the output, and ensure it follows the format example
"""

logic_plan_prompt = """

Task

Your goal is break down tasks according to PRD/technical design, generate a task list, and analyze task dependencies.
You will break down tasks, analyze dependencies.

You outline a clear PRD/technical design for reproducing the paper’s method and experiments.

Now, let's break down tasks according to PRD/technical design, generate a task list, and analyze task dependencies.
The Logic Analysis should not only consider the dependencies between files but also provide detailed descriptions to assist in writing the code needed to reproduce the paper.


Format Example

json
{"Required packages": ["numpy==1.21.0", "torch==1.9.0"], "Required Other language third-party packages": ["No third-party dependencies required"], "Logic Analysis": [["data_preprocessing.py", "DataPreprocessing class ........"],["trainer.py", "Trainer ....... "],["dataset_loader.py", "Handles loading and ........"],["model.py", "Defines the model ......."],["evaluation.py", "Evaluation class ........ "],["main.py", "Entry point ......."]], "Task list": ["dataset_loader.py", "model.py", "trainer.py", "evaluation.py", "main.py" ], "Full API spec": "openapi: 3.0.0 ...", "Shared Knowledge": "Both data_preprocessing.py and trainer.py share ........", "Anything UNCLEAR": "Clarification needed on recommended hardware configuration for large-scale experiments."}

Nodes: "<node>: <type> # <instruction>"

  • Required packages: typing.Optional[typing.List[str]] # Provide required third-party packages in requirements.txt format.(e.g., 'numpy==1.21.0').
  • Required Other language third-party packages: typing.List[str] # List down packages required for non-Python languages. If none, specify "No third-party dependencies required".
  • Logic Analysis: typing.List[typing.List[str]] # Provide a list of files with the classes/methods/functions to be implemented, including dependency analysis and imports. Include as much detailed description as possible.
  • Task list: typing.List[str] # Break down the tasks into a list of filenames, prioritized based on dependency order. The task list must include the previously generated file list.
  • Full API spec: <class 'str'> # Describe all APIs using OpenAPI 3.0 spec that may be used by both frontend and backend. If front-end and back-end communication is not required, leave it blank.
  • Shared Knowledge: <class 'str'> # Detail any shared knowledge, like common utility functions or configuration variables.
  • Anything UNCLEAR: <class 'str'> # Mention any unresolved questions or clarifications needed from the paper or project scope.

Constraint

Format: output wrapped in json format strictly, ensure the result can be parsed correctly like the format example, nothing else.

Action

Follow the node instructions above, generate your output accordingly, and ensure it follows the given format example.
"""

configuration_plan_prompt = """

Long-term memory

Paper Content:

markdown
{paper_markdown}

Overall Plan:

markdown
{overall_plan}

Architecture Plan:

json
{architecture_plan}

Logic Plan:

json
{logic_plan}

Task

You write elegant, modular, and maintainable code. Adhere to Google-style guidelines.

Based on the paper, plan, design specified previously, follow the "Format Example" and generate the code.
Extract the training details from the above paper (e.g., learning rate, batch size, epochs, etc.), follow the "Format example" and generate the code.
DO NOT FABRICATE DETAILS — only use what the paper provides.

You must write config.yaml.

ATTENTION: Use '##' to SPLIT SECTIONS, not '#'. Your output format must follow the example below exactly.


Format Example

yaml
## config.yaml training: learning_rate: ... batch_size: ... epochs: ... ...

Code: config.yaml

"""

------------------------------

FIM DO CONTEÚDO DE: Markdown-Paper2Code\plan_prompts.py

------------------------------

==============================

INÍCIO DO CONTEÚDO DE: Markdown-Paper2Code\utils.py

==============================

import json
import re
from typing import Dict, List, Optional
import requests

def local_llm_request(model: str,
messages: List[Dict[str, str]],
url: str,
verify: Optional[bool] = False,
timeout: Optional[int] = None,
stream: Optional[bool] = False):
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer EMPTY'
}
if "/v1/chat/completions" not in url:
url = url + "/v1/chat/completions"
payload_dict = {
"model": model,
"messages": messages,
"stream": stream,
"temperature": 0.,
}
payload = json.dumps(payload_dict, ensure_ascii=False)
result_payload = requests.post(url=url,
data=payload.encode('utf-8'),
headers=headers,
verify=verify,
timeout=timeout)
response_dict = json.loads(result_payload.text)
if 'choices' not in response_dict.keys():
return f"Error Response: {response_dict}"
return response_dict['choices'][0]['message']['content'], response_dict['usage']['total_tokens']

def remote_llm_request():
pass

def extract_data_from_json(content):
pattern = r'json\s*(.*?)'
result = re.search(pattern, content, re.DOTALL)

text
if result: extracted_code = result.group(1).strip() else: extracted_code = "" print("[WARNING] No json data found.") try: extracted_code = json.loads(extracted_code.strip()) except: print("[WARNING] Invalid json data found.") try: extracted_code = eval(extracted_code.strip()) except Exception as e: print(f"[ERROR] Invalid python data: {e}") return content.strip() return extracted_code

def extract_data_from_yaml(content):
pattern = r'yaml\s*(.*?)'
result = re.search(pattern, content.strip(), re.DOTALL)

text
if result: extracted_code = result.group(1).strip() else: extracted_code = "" print("[WARNING] No yaml data found.") return extracted_code

def extract_code_from_python(content):
pattern = r'python\s*(.*?)'
result = re.search(pattern, content.strip(), re.DOTALL)

text
if result: extracted_code = result.group(1).strip() else: extracted_code = "" print("[WARNING] No python code found.") return extracted_code

def extract_data_from_markdown(content):
pattern = r'markdown\s*(.*?)'
result = re.search(pattern, content.strip(), re.DOTALL)

text
if result: extracted_code = result.group(1).strip() else: extracted_code = "" print("[WARNING] No markdown content found.") return extracted_code

------------------------------

FIM DO CONTEÚDO DE: Markdown-Paper2Code\utils.py

------------------------------

Pensamento

Merging scripts into one

It looks like the user wants to combine multiple scripts into one single script. They’ve mentioned working on a project called ‘Markdown-Paper2Code,’ and they’re asking for the complete updated code in markdown format. My task is to merge the different scripts, which involves rethinking the structure of the project to make everything fit into one cohesive file. I'll focus on ensuring that it works smoothly as a unified script for the user, without any loose parts.

Merging scripts into one

The user wants the code from multiple files combined into one single script. This means merging six different scripts into a cohesive file, with logic and imports reorganized logically. We'll keep the existing structure, like classes and functions, and ensure no duplicate imports. Using argparse with subcommands for different processes, like plan, analysis, and coding, could be a good way to handle this. The final script, which I’ll call "paper2code.py," will include all the necessary functions and options, outputted in markdown format as requested.

Thought for 33 seconds

Resposta

Aqui está uma versão monolítica que reúne todo o conteúdo dos 7 arquivos Python mostrados em apenas um script.
Ele preserva toda a lógica original e acrescenta uma CLI simples com sub-comandos (plan, analysis, coding, coding-prompts) para que você possa executar cada fase como antes.

python
#!/usr/bin/env python3 # ============================================================ # Markdown-Paper2Code – versão SINGLE SCRIPT # Combina: # · plan_process.py · plan_prompts.py # · analysis_process.py · analysis_prompts.py # · coding_process.py · coding_prompts.py # · utils.py # # Uso rápido (exemplos): # python paper2code.py plan \ # --paper_markdown ./examples/example_paper.md # # python paper2code.py analysis \ # --plan_json ./outputs/plan_result.json # # python paper2code.py coding \ # --analysis_json ./outputs/analysis_result.json # # # apenas imprimir os prompts para usar manualmente # python paper2code.py coding-prompts \ # --analysis_json ./outputs/analysis_result.json # ============================================================ from __future__ import annotations import argparse import copy import json import os import re import sys from typing import Dict, List, Optional, Any import requests import openai # ============================================================ # UTILS (de utils.py) # ============================================================ def local_llm_request( model: str, messages: List[Dict[str, str]], url: str, verify: bool | None = False, timeout: int | None = None, stream: bool | None = False, ) -> tuple[str, int] | str: """Faz requisição para LLM self-hosted compatível com API OpenAI.""" headers = { "Content-Type": "application/json", "Authorization": "Bearer EMPTY", } if "/v1/chat/completions" not in url: url = url + "/v1/chat/completions" payload_dict = { "model": model, "messages": messages, "stream": stream, "temperature": 0.0, } payload = json.dumps(payload_dict, ensure_ascii=False) result_payload = requests.post( url=url, data=payload.encode("utf-8"), headers=headers, verify=verify, timeout=timeout, ) response_dict = json.loads(result_payload.text) if "choices" not in response_dict: return f"Error Response: {response_dict}" return ( response_dict["choices"][0]["message"]["content"], response_dict["usage"]["total_tokens"], ) # --- helpers de extração -------------------------------------------------- def _extract_block(content: str, lang: str) -> str: """ Extrai bloco ```<lang>```...``` do `content`. Retorna string vazia se não encontrar. """ pattern = rf"```{lang}\s*(.*?)```" result = re.search(pattern, content.strip(), re.DOTALL) if result: return result.group(1).strip() print(f"[WARNING] No {lang} block found.") return "" def extract_data_from_json(content: str) -> Any: raw = _extract_block(content, "json") try: return json.loads(raw) except json.JSONDecodeError: print("[WARNING] Invalid JSON, tentando eval()...") try: return eval(raw) except Exception as exc: # noqa: BLE001,S101 print(f"[ERROR] Falha ao avaliar JSON: {exc}") return raw def extract_data_from_yaml(content: str) -> str: return _extract_block(content, "yaml") def extract_code_from_python(content: str) -> str: return _extract_block(content, "python") def extract_data_from_markdown(content: str) -> str: return _extract_block(content, "markdown") # ============================================================ # PROMPTS (de *_prompts.py) # ============================================================ # ---------- analysis_prompts.py ---------- ANALYSIS_SYSTEM_PROMPT = r""" **Role**: You are an expert researcher, strategic analyzer and software engineer with a deep understanding of experimental design and reproducibility in scientific research. You will receive a research paper in JSON format, an overview of the plan, a design in JSON format consisting of "Implementation approach", "File list", "Data structures and interfaces", and "Program call flow", followed by a task in JSON format that includes "Required packages", "Required other language third-party packages", "Logic Analysis", and "Task list", along with a configuration file named "config.yaml". **Task**: Your task is to conduct a comprehensive logic analysis to accurately reproduce the experiments and methodologies described in the research paper. This analysis must align precisely with the paper’s methodology, experimental setup, and evaluation criteria. 1. Align with the Paper: Your analysis must strictly follow the methods, datasets, model configurations, hyperparameters, and experimental setups described in the paper. 2. Be Clear and Structured: Present your analysis in a logical, well-organized, and actionable format that is easy to follow and implement. 3. Prioritize Efficiency: Optimize the analysis for clarity and practical implementation while ensuring fidelity to the original experiments. 4. Follow design: YOU MUST FOLLOW "Data structures and interfaces". DONT CHANGE ANY DESIGN. Do not use public member functions that do not exist in your design. 5. REFER TO CONFIGURATION: Always reference settings from the config.yaml file. Do not invent or assume any values—only use configurations explicitly provided. """ DETAIL_ANALYSIS_PROMPT = r""" ## Instruction **Objective**: Write the logic analysis in '{todo_file_name}', which is intended for '{todo_file_desc}'. **Constraints**: * Conduct a Logic Analysis to assist in writing the code; * Based on the paper, plan, design, task and the previously specified configuration file (config.yaml), which are in your long-term memory; * You DON'T need to provide the actual code yet; focus on a thorough, clear analysis; * You only need to analyze the specific objective of the task, focused on the file '{todo_file_name}'; * Your result is markdown formatted like output format. ## Output Format ```markdown Your detailed logic analysis on the file {todo_file_name} ...

"""

LONG_TERM_MEMORY_PROMPT = r"""

Long-term memory

Paper Content:

markdown
{markdown_paper}

Overview of the plan:

markdown
{overall_plan}

Design:

json
{architecture_plan}

Task:

json
{logic_plan}

Configuration file:

yaml
{config_yaml}

"""

---------- coding_prompts.py ----------

CODING_SYSTEM_PROMPT = r""" Role: You are an expert researcher and software engineer with a deep understanding of experimental design and reproducibility in scientific research. You will receive a research paper in JSON format, an overview of the plan, a Design in JSON format consisting of "Implementation approach", "File list", "Data structures and interfaces", and "Program call flow", followed by a Task in JSON format that includes "Required packages", "Required other language third-party packages", "Logic Analysis", and "Task list", along with a configuration file named "config.yaml".

Task: Your task is to write code to reproduce the experiments and methodologies described in the paper.

  • The code you write must be elegant, modular, and maintainable, adhering to Google-style guidelines.
  • The code must strictly align with the paper's methodology, experimental setup, and evaluation metrics.
  • Write code with triple quoto. """

DETAIL_CODING_PROMPT = r"""

Format example

python
the code of the file '{todo_file_name}'

Instruction

Objective: Based on the paper, plan, design, task and the previously specified configuration file (config.yaml), which are in your long-term memory, follow "Format example", write the code. We have {done_file_list}. Next you must write only the code of the file '{todo_file_name}' in Python.

constraints:

  1. Only One file: do your best to implement THIS ONLY ONE FILE.
  2. COMPLETE CODE: Your code will be part of the entire project, so please implement complete, reliable, reusable code snippets.
  3. Set default value: If there is any setting, ALWAYS SET A DEFAULT VALUE, ALWAYS USE STRONG TYPE AND EXPLICIT VARIABLE. AVOID circular import.
  4. Follow design: YOU MUST FOLLOW "Data structures and interfaces". DONT CHANGE ANY DESIGN. Do not use public member functions that do not exist in your design.
  5. CAREFULLY CHECK THAT YOU DONT MISS ANY NECESSARY CLASS/FUNCTION IN THIS FILE.
  6. Before using a external variable/module, make sure you import it first.
  7. Write out EVERY CODE DETAIL, DON'T LEAVE TODO.
  8. REFER TO CONFIGURATION: you must use configuration from "config.yaml". DO NOT FABRICATE any configuration values.

Detailed Logic Analysis Assistance: The Detailed logic analysis to assist you in writing the code of the file '{todo_file_name}': {detailed_logic_analysis} """

---------- plan_prompts.py ----------

PLAN_SYSTEM_PROMPT = r""" Role: You are an expert researcher and strategic planner with a deep understanding of experimental design and reproducibility in scientific research. You will receive a research paper in MARKDOWN format. Your task is to create a detailed and efficient plan to reproduce the experiments and methodologies described in the paper. This plan should align precisely with the paper's methodology, experimental setup, and evaluation metrics.

Instructions:

  1. Align with the Paper: Your plan must strictly follow the methods, datasets, model configurations, hyperparameters, and experimental setups described in the paper.
  2. Be Clear and Structured: Present the plan in a well-organized and easy-to-follow format, breaking it down into actionable steps.
  3. Prioritize Efficiency: Optimize the plan for clarity and practical implementation while ensuring fidelity to the original experiments. """

OVERALL_PLAN_PROMPT = r"""

Task

  1. We want to reproduce the method described in the attached paper.
  2. The authors did not release any official code, so we have to plan our own implementation.
  3. Before writing any Python code, please outline a comprehensive plan that covers:
    • Key details from the paper's Methodology.
    • Important aspects of Experiments, including dataset requirements, experimental settings, hyperparameters, or evaluation metrics.
  4. The plan should be as detailed and informative as possible to help us write the final code later.

Requirements

  • You don't need to provide the actual code yet; focus on a thorough, clear strategy.
  • If something is unclear from the paper, mention it explicitly.

Instruction

The response should give us a strong roadmap, making it easier to write the code later. """

ARCHITECTURE_PLAN_PROMPT = r"""

Task

Your goal is to create a concise, usable, and complete software system design for reproducing the paper's method. Use appropriate open-source libraries and keep the overall architecture simple.

Based on the plan for reproducing the paper’s main method, please design a concise, usable, and complete software system. Keep the architecture simple and make effective use of open-source libraries.


Format Example

json
{"Implementation approach": "We will ...", "File list": ["main.py", "dataset_loader.py", "model.py", "trainer.py", "evaluation.py"], "Data structures and interfaces": "\nclassDiagram\n class Main {\n +__init__()\n +run_experiment()\n }\n class DatasetLoader {\n +__init__(config: dict)\n +load_data() -> Any\n }\n class Model {\n +__init__(params: dict)\n +forward(x: Tensor) -> Tensor\n }\n class Trainer {\n +__init__(model: Model, data: Any)\n +train() -> None\n }\n class Evaluation {\n +__init__(model: Model, data: Any)\n +evaluate() -> dict\n }\n Main --> DatasetLoader\n Main --> Trainer\n Main --> Evaluation\n Trainer --> Model\n","Program call flow": "\nsequenceDiagram\n participant M as Main\n participant DL as DatasetLoader\n participant MD as Model\n participant TR as Trainer\n participant EV as Evaluation\n M->>DL: load_data()\n DL-->>M: return dataset\n M->>MD: initialize model()\n M->>TR: train(model, dataset)\n TR->>MD: forward(x)\n MD-->>TR: predictions\n TR-->>M: training complete\n M->>EV: evaluate(model, dataset)\n EV->>MD: forward(x)\n MD-->>EV: predictions\n EV-->>M: metrics\n","Anything UNCLEAR": "Need clarification on the exact dataset format and any specialized hyperparameters."}

Nodes: "<node>: <type> # <instruction>"

  • Implementation approach: <class 'str'> # Summarize the chosen solution strategy.
  • File list: typing.List[str] # Only need relative paths. ALWAYS write a main.py or app.py here.
  • Data structures and interfaces: typing.Optional[str] # Use mermaid classDiagram code syntax, including classes, method(init etc.) and functions with type annotations, CLEARLY MARK the RELATIONSHIPS between classes, and comply with PEP8 standards. The data structures SHOULD BE VERY DETAILED and the API should be comprehensive with a complete design.
  • Program call flow: typing.Optional[str] # Use sequenceDiagram code syntax, COMPLETE and VERY DETAILED, using CLASSES AND API DEFINED ABOVE accurately, covering the CRUD AND INIT of each object, SYNTAX MUST BE CORRECT.
  • Anything UNCLEAR: <class 'str'> # Mention ambiguities and ask for clarifications.

Constraint

Format: output wrapped in json format, ensure the result can be parsed by json module correctly like the format example, nothing else.

Action

Follow the instructions for the nodes, generate the output, and ensure it follows the format example """

LOGIC_PLAN_PROMPT = r"""

Task

Your goal is break down tasks according to PRD/technical design, generate a task list, and analyze task dependencies. You will break down tasks, analyze dependencies.

You outline a clear PRD/technical design for reproducing the paper’s method and experiments.

Now, let's break down tasks according to PRD/technical design, generate a task list, and analyze task dependencies. The Logic Analysis should not only consider the dependencies between files but also provide detailed descriptions to assist in writing the code needed to reproduce the paper.


Format Example

json
{"Required packages": ["numpy==1.21.0", "torch==1.9.0"], "Required Other language third-party packages": ["No third-party dependencies required"], "Logic Analysis": [["data_preprocessing.py", "DataPreprocessing class ........"],["trainer.py", "Trainer ....... "],["dataset_loader.py", "Handles loading and ........"],["model.py", "Defines the model ......."],["evaluation.py", "Evaluation class ........ "],["main.py", "Entry point ......."]], "Task list": ["dataset_loader.py", "model.py", "trainer.py", "evaluation.py", "main.py" ], "Full API spec": "openapi: 3.0.0 ...", "Shared Knowledge": "Both data_preprocessing.py and trainer.py share ........", "Anything UNCLEAR": "Clarification needed on recommended hardware configuration for large-scale experiments."}

Nodes: "<node>: <type> # <instruction>"

  • Required packages: typing.Optional[typing.List[str]] # Provide required third-party packages in requirements.txt format.(e.g., 'numpy==1.21.0').
  • Required Other language third-party packages: typing.List[str] # List down packages required for non-Python languages. If none, specify "No third-party dependencies required".
  • Logic Analysis: typing.List[typing.List[str]] # Provide a list of files with the classes/methods/functions to be implemented, including dependency analysis and imports. Include as much detailed description as possible.
  • Task list: typing.List[str] # Break down the tasks into a list of filenames, prioritized based on dependency order. The task list must include the previously generated file list.
  • Full API spec: <class 'str'> # Describe all APIs using OpenAPI 3.0 spec that may be used by both frontend and backend. If front-end and back-end communication is not required, leave it blank.
  • Shared Knowledge: <class 'str'> # Detail any shared knowledge, like common utility functions or configuration variables.
  • Anything UNCLEAR: <class 'str'> # Mention any unresolved questions or clarifications needed from the paper or project scope.

Constraint

Format: output wrapped in json format strictly, ensure the result can be parsed correctly like the format example, nothing else.

Action

Follow the node instructions above, generate your output accordingly, and ensure it follows the given format example. """

CONFIGURATION_PLAN_PROMPT = r"""

Long-term memory

Paper Content:

markdown
{paper_markdown}

Overall Plan:

markdown
{overall_plan}

Architecture Plan:

json
{architecture_plan}

Logic Plan:

json
{logic_plan}

Task

You write elegant, modular, and maintainable code. Adhere to Google-style guidelines.

Based on the paper, plan, design specified previously, follow the "Format Example" and generate the code. Extract the training details from the above paper (e.g., learning rate, batch size, epochs, etc.), follow the "Format example" and generate the code. DO NOT FABRICATE DETAILS — only use what the paper provides.

You must write config.yaml.

ATTENTION: Use '##' to SPLIT SECTIONS, not '#'. Your output format must follow the example below exactly.


Format Example

yaml
## config.yaml training: learning_rate: ... batch_size: ... epochs: ... ...

Code: config.yaml

"""

============================================================

PLAN PROCESS (de plan_process.py)

============================================================

def plan_parse_args(argv: list[str]) -> argparse.Namespace: parser = argparse.ArgumentParser("Paper2Code-Plan Process") parser.add_argument( "--paper_markdown", type=str, default="./examples/example_paper.md", help="Input markdown-formatted paper", ) parser.add_argument( "--base_url", type=str, default="http://localhost:3000/v1", help="LLM API URL Endpoint", ) parser.add_argument( "--model", type=str, default="QwQ-32B", help="Model name of local LLM", ) parser.add_argument( "--api_key", type=str, default="EMPTY", help="API key of local LLM", ) parser.add_argument( "--plan_json", type=str, default="./outputs/plan_result.json", help="Output json", ) parser.add_argument( "--config_yaml", type=str, default="./outputs/config.yaml", help="Generated YAML config file", ) return parser.parse_args(argv)

def plan_process(args: argparse.Namespace) -> dict[str, Any]: messages = [{"role": "system", "content": PLAN_SYSTEM_PROMPT}] long_term_mem = "## Long-term memory\n\n"

text
# 1) Ler o paper with open(args.paper_markdown, "r", encoding="utf-8") as f: paper_md = f.read() result_dict: dict[str, Any] = {"paper_content": paper_md} long_term_mem += "**Paper Content**:\n```markdown\n" + paper_md + "\n```\n\n" client = openai.OpenAI( base_url=args.base_url, api_key=args.api_key, timeout=3000 ) # --- Overall Plan print("Process: Overall Plan\tResult Type: Markdown") messages.append( { "role": "user", "content": long_term_mem + "-----\n\n" + OVERALL_PLAN_PROMPT, } ) overall_plan = client.chat.completions.create( model=args.model, messages=messages, temperature=0.95, timeout=3000, ).choices[0].message.content.strip() overall_plan = overall_plan.split("</think>")[-1].strip() result_dict["overall_plan"] = overall_plan long_term_mem += "**Overall Plan**:\n```markdown\n" + overall_plan + "\n```\n\n" print("-----") # --- Architecture Plan print("Process: Architecture Plan\tResult Type: JSON") messages[-1]["content"] = long_term_mem + "-----\n\n" + ARCHITECTURE_PLAN_PROMPT architecture_plan_raw = client.chat.completions.create( model=args.model, messages=messages, temperature=0.95, timeout=3000, ).choices[0].message.content.strip() architecture_plan_raw = architecture_plan_raw.split("</think>")[-1].strip() architecture_plan = extract_data_from_json(architecture_plan_raw) result_dict["architecture_plan"] = architecture_plan long_term_mem += "**Architecture Design**:\n```markdown\n" + architecture_plan_raw + "\n```\n\n" print(f"Keys: {architecture_plan.keys()}\n-----") # --- Logic Plan print("Process: Logic Plan\tResult Type: JSON") messages[-1]["content"] = long_term_mem + "-----\n\n" + LOGIC_PLAN_PROMPT logic_plan_raw = client.chat.completions.create( model=args.model, messages=messages, temperature=0.95, timeout=3000, ).choices[0].message.content.strip() logic_plan_raw = logic_plan_raw.split("</think>")[-1].strip() logic_plan = extract_data_from_json(logic_plan_raw) result_dict["logic_plan"] = logic_plan long_term_mem += "**Logic Design**:\n```markdown\n" + logic_plan_raw + "\n```\n\n" print(f"Keys: {logic_plan.keys()}\n-----") # --- Configuration Plan print("Process: Configuration Plan\tResult Type: YAML") messages[-1]["content"] = long_term_mem + "-----\n\n" + CONFIGURATION_PLAN_PROMPT config_yaml_raw = client.chat.completions.create( model=args.model, messages=messages, temperature=0.95, timeout=3000, ).choices[0].message.content.strip() config_yaml_raw = config_yaml_raw.split("</think>")[-1].strip() result_dict["configuration_plan"] = extract_data_from_yaml(config_yaml_raw) with open(args.config_yaml, "w", encoding="utf-8") as fp: fp.write(result_dict["configuration_plan"]) print("-----") return result_dict

============================================================

ANALYSIS PROCESS (de analysis_process.py)

============================================================

def analysis_parse_args(argv: list[str]) -> argparse.Namespace: parser = argparse.ArgumentParser("Paper2Code-Analysis Process") parser.add_argument( "--plan_json", type=str, default="./outputs/plan_result.json", help="LLM planned json-formatted file", ) parser.add_argument( "--base_url", type=str, default="http://localhost:3000/v1", help="LLM API URL Endpoint", ) parser.add_argument( "--model", type=str, default="QwQ-32B", help="Model name of local LLM", ) parser.add_argument( "--api_key", type=str, default="EMPTY", help="API key of local LLM", ) parser.add_argument( "--analysis_json", type=str, default="./outputs/analysis_result.json", help="detailed LLM analyzed json-formatted file", ) return parser.parse_args(argv)

def analysis_process(args: argparse.Namespace) -> dict[str, Any]: with open(args.plan_json, "r", encoding="utf-8") as fp: plan_json = json.load(fp)

text
markdown_paper = plan_json["paper_content"] overall_plan = plan_json["overall_plan"] architecture_plan_str = json.dumps(plan_json["architecture_plan"]) logic_plan_str = json.dumps(plan_json["logic_plan"]) yaml_plan = plan_json["configuration_plan"] logic_analysis_list = plan_json["logic_plan"]["Logic Analysis"] long_term_memory = LONG_TERM_MEMORY_PROMPT.format( markdown_paper=markdown_paper, overall_plan=overall_plan, architecture_plan=architecture_plan_str, logic_plan=logic_plan_str, config_yaml=yaml_plan, ) # montar prompts para cada arquivo task_prompts = [ [ file_name, copy.deepcopy(DETAIL_ANALYSIS_PROMPT).format( todo_file_name=file_name, todo_file_desc=file_desc ), ] for file_name, file_desc in logic_analysis_list ] result_dict: dict[str, Any] = {"long_term_memory": long_term_memory} messages = [ { "role": "system", "content": ANALYSIS_SYSTEM_PROMPT + "\n\n-----\n\n" + long_term_memory, } ] client = openai.OpenAI( base_url=args.base_url, api_key=args.api_key, timeout=3000 ) for file_name, prompt in task_prompts: print(f"Current Analysing File: {file_name}") messages.append({"role": "user", "content": prompt}) rsp = client.chat.completions.create( model=args.model, messages=messages, temperature=0.95, timeout=3000, ).choices[0].message.content.strip() rsp = rsp.split("</think>")[-1].strip() result_dict[file_name] = rsp messages.append({"role": "assistant", "content": rsp}) return result_dict

============================================================

CODING PROCESS (de coding_process.py)

============================================================

def coding_parse_args(argv: list[str]) -> argparse.Namespace: parser = argparse.ArgumentParser("Paper2Code-Coding Process") parser.add_argument( "--analysis_json", type=str, default="./outputs/analysis_result.json", help="Input analysis json", ) parser.add_argument( "--base_url", type=str, default="https://api.deepseek.com/v1", help="LLM API URL Endpoint", ) parser.add_argument( "--model", type=str, default="deepseek-chat", help="Model name of remote LLM", ) parser.add_argument( "--api_key", type=str, default="", help="API key of remote LLM", ) parser.add_argument( "--save_dir", type=str, default="./outputs", help="Directory to save generated code", ) parser.add_argument( "--coding_json", type=str, default="./outputs/coding_result.json", help="LLM generated code json", ) return parser.parse_args(argv)

def coding_process(args: argparse.Namespace) -> dict[str, str]: with open(args.analysis_json, "r", encoding="utf-8") as fp: analysis_json = json.load(fp)

text
file_list = [k for k in analysis_json if k.endswith(".py")] long_term_memory = analysis_json["long_term_memory"] messages = [ { "role": "system", "content": CODING_SYSTEM_PROMPT + "\n\n-----\n\n" + long_term_memory, } ] finished_code_files_md = "## Finished Code Files\n" finished_list: list[str] = [] result_dict: dict[str, str] = {} client = openai.OpenAI( base_url=args.base_url, api_key=args.api_key, timeout=3000 ) for idx, file_name in enumerate(file_list): print(f"Current Code File: {file_name}") task_prompt = copy.deepcopy(DETAIL_CODING_PROMPT).format( todo_file_name=file_name, detailed_logic_analysis=analysis_json[file_name], done_file_list=str(finished_list), ) if idx == 0: messages.append({"role": "user", "content": task_prompt}) else: messages[-1]["content"] = finished_code_files_md + "\n\n-----\n\n" + task_prompt rsp = client.chat.completions.create( model=args.model, messages=messages, temperature=0.95, timeout=3000, ).choices[0].message.content.strip() rsp = rsp.split("</think>")[-1].strip() finished_code_files_md += f"**{file_name}**:\n{rsp}\n" extracted_code = extract_code_from_python(rsp) result_dict[file_name] = extracted_code # salvar arquivo save_path = os.path.join(args.save_dir, file_name) os.makedirs(os.path.dirname(save_path), exist_ok=True) with open(save_path, "w", encoding="utf-8") as fp: fp.write(extracted_code) finished_list.append(file_name) return result_dict

def print_coding_prompts(args: argparse.Namespace) -> None: """Imprime todos os prompts para uso manual em endpoints web.""" with open(args.analysis_json, "r", encoding="utf-8") as fp: analysis_json = json.load(fp)

text
file_list = [k for k in analysis_json if k.endswith(".py")] long_term_memory = analysis_json["long_term_memory"] print("**********\nSystem Prompt:\n" + CODING_SYSTEM_PROMPT + "\n\n-----\n\n" + long_term_memory) finished_list: list[str] = [] for file_name in file_list: print("**********\n") task_prompt = copy.deepcopy(DETAIL_CODING_PROMPT).format( todo_file_name=file_name, detailed_logic_analysis=analysis_json[file_name], done_file_list=str(finished_list), ) print(f"```markdown\n{task_prompt}\n```\n") finished_list.append(file_name)

============================================================

MAIN CLI

============================================================

def main() -> None: parser = argparse.ArgumentParser( description="Markdown-Paper2Code – script único com sub-comandos" ) subparsers = parser.add_subparsers(dest="command", required=True)

text
# Sub-parser plan plan_sp = subparsers.add_parser("plan", help="Executa o Plan Process") for action in plan_parse_args([])._actions: if action.dest != "help": plan_sp.add_argument(*action.option_strings, **vars(action)) # Sub-parser analysis analysis_sp = subparsers.add_parser("analysis", help="Executa o Analysis Process") for action in analysis_parse_args([])._actions: if action.dest != "help": analysis_sp.add_argument(*action.option_strings, **vars(action)) # Sub-parser coding coding_sp = subparsers.add_parser("coding", help="Executa o Coding Process") for action in coding_parse_args([])._actions: if action.dest != "help": coding_sp.add_argument(*action.option_strings, **vars(action)) # Sub-parser coding-prompts (somente imprime prompts) prompts_sp = subparsers.add_parser( "coding-prompts", help="Imprime prompts de coding (sem gerar código)" ) prompts_sp.add_argument( "--analysis_json", type=str, default="./outputs/analysis_result.json", help="Input analysis json", ) args = parser.parse_args() if args.command == "plan": res = plan_process(args) with open(args.plan_json, "w", encoding="utf-8") as fp: json.dump(res, fp) print(f"[OK] Plan salvo em {args.plan_json}") elif args.command == "analysis": res = analysis_process(args) with open(args.analysis_json, "w", encoding="utf-8") as fp: json.dump(res, fp) print(f"[OK] Analysis salvo em {args.analysis_json}") elif args.command == "coding": res = coding_process(args) with open(args.coding_json, "w", encoding="utf-8") as fp: json.dump(res, fp) print(f"[OK] Código salvo em {args.coding_json}") elif args.command == "coding-prompts": print_coding_prompts(args) else: # nunca deve acontecer graças ao required=True parser.print_help() sys.exit(1)

if name == "main": main()

text
### O que mudou? 1. **Tudo em um único arquivo (`paper2code.py`)** – cada seção antiga é marcada por blocos de comentários. 2. **Sub-comandos** via `argparse` para manter o mesmo fluxo (`plan`, `analysis`, `coding`, `coding-prompts`). 3. **Evita colisões de função**: parseadores mantêm nomes únicos (`plan_parse_args`, `analysis_parse_args`, …). 4. **Criação automática de diretórios** ao salvar código gerado. 5. **Imports deduplicados** e helpers centralizados. Basta salvar esse trecho como `paper2code.py`, instalar as dependências necessárias (`openai`, `requests`) e continuar utilizando exatamente o mesmo workflow, só que agora com um único script.

Compartilhar esta Q&A