找出上传文件中,姓名一列有重复的序号,并予以标注

बनाया गया: 22 जनवरी 2026

का उपयोग करके उत्तर दिया गया GPT-5.2 Thinking द्वारा Chat01

प्रश्न

2025年度企业参与研发项目科技人员汇总表.xlsx

找出上传文件中,姓名一列有重复的序号,并予以标注

Python

python
import openpyxl, os, pandas as pd, re, math, json, statistics path="/mnt/data/2025年度企业参与研发项目科技人员汇总表.xlsx" wb=openpyxl.load_workbook(path) wb.sheetnames ['Sheet1']

Python

python
ws=wb["Sheet1"] # find max row/col with data max_row=ws.max_row max_col=ws.max_column max_row, max_col (253, 15)

Python

python
def row_values(r, cmax=15): return [ws.cell(r,c).value for c in range(1,cmax+1)] for r in range(1,11): print(r, row_values(r, 15))

Python

python
import re, collections name_col=4 serial_col=1 duplicates=collections.defaultdict(list) for r in range(3, max_row+1): name=ws.cell(r,name_col).value serial=ws.cell(r,serial_col).value if name is None or str(name).strip()=="": continue key=re.sub(r'[\s\u3000]+', '', str(name).strip()) duplicates[key].append((r, serial, name)) dup_items={k:v for k,v in duplicates.items() if len(v)>1} len(dup_items), list(dup_items.items())[:3] (63, [('路严', [(3, 1, '路 严'), (152, 150, '路严')]), ('崔恒庆', [(4, 2, '崔恒庆'), (161, 159, '崔恒庆')]), ('杨兴坤', [(5, 3, '杨兴坤'), (155, 153, '杨兴坤')])])

Python

python
header=row_values(2, max_col) list(enumerate(header, start=1)) [(1, '序号'), (2, '单位'), (3, '部门、车间'), (4, '姓名'), (5, '性别'), (6, '身份证号'), (7, '职务(工种)'), (8, '参加工作时间'), (9, '毕业院校'), (10, '专业'), (11, '学历\n(文化程度)'), (12, '职称\n(技术等级)'), (13, '人员分类\n(干/工)'), (14, '参与研发项目'), (15, None)]

Python

python
from openpyxl.utils import get_column_letter dim=ws.column_dimensions[get_column_letter(14)].width dim15=ws.column_dimensions[get_column_letter(15)].width dim, dim15 (12.0, 13.0)

Python

python
from copy import copy h14=ws.cell(2,14) h15=ws.cell(2,15) h14.fill.patternType, h14.font.bold, h14.alignment.wrap_text, h14.border.left.style, h14.border.right.style, h15.value (None, True, True, 'thin', 'thin', None)

Python

python
d14=ws.cell(3,14) d15=ws.cell(3,15) (d14.border.left.style, d14.alignment.horizontal, d14.alignment.wrap_text, d15.border.left.style, d15.value) ('thin', 'center', True, None, None)

Python

python
ws.merged_cells.ranges {<MergedCellRange A1:M1>}

विचार

Creating summary and output file

The title row is merged across the first 13 columns. Column 14 is actually N, and it looks like titles should be across the first 13 columns, which is fine. I’ll check for any formulas, but it’s probably clear of them. I plan to create an output file, listing duplicates by serial number, and adding the top 20 duplicates in a table. I’ll include the full list in the sheet and give a brief summary of total duplicates. The summary might also include a name list. We can output a CSV for easy access as well.

Implementing steps for handling duplicates

I'll start by loading the workbook with openpyxl. I'll locate the header row containing '姓名' and '序号', likely in row 2. Then I'll identify an available column, either col15 or the next available one, and label it "姓名重复" (duplicates). I’ll copy the header style from "参与研发项目". For each data row, I'll mark duplicates with "重复" and apply the style from col14. I'll also highlight the serial and name cells for duplicates, preserving the borders and adjusting only the fill and font color.

Finalizing duplicate handling and output

I’ll create a new sheet titled "重复姓名汇总" with columns for both normalized and original names. For each duplicate, I'll collect the normalized and original names, along with serial numbers and occurrence counts. I’ll also format the header with bold text, freeze the panes, and adjust column widths. After saving, I’ll recalculate and ensure everything looks correct before rendering sheet images, especially for the first sheet. Finally, I’ll check column 14's existing styles and apply them as needed.

Thought for 1m 57s

Python

python
import openpyxl, re, collections, os from openpyxl.styles import PatternFill, Font from openpyxl.utils import get_column_letter from copy import copy import pandas as pd in_path = "/mnt/data/2025年度企业参与研发项目科技人员汇总表.xlsx" wb = openpyxl.load_workbook(in_path) ws = wb.active # --- Find header row and columns --- header_row = None name_col = None serial_col = None for r in range(1, min(ws.max_row, 30) + 1): row = [ws.cell(r, c).value for c in range(1, ws.max_column + 1)] if "姓名" in row and "序号" in row: header_row = r name_col = row.index("姓名") + 1 serial_col = row.index("序号") + 1 break if header_row is None: raise ValueError("未找到包含“姓名”和“序号”的表头行。") data_start = header_row + 1 # Pick a marking column (use first completely blank header cell if available, else append) mark_col = None for c in range(1, ws.max_column + 1): if ws.cell(header_row, c).value in (None, ""): mark_col = c break if mark_col is None: mark_col = ws.max_column + 1 # --- Build duplicate map (normalize by removing all whitespace, incl. fullwidth) --- dups = collections.defaultdict(list) for r in range(data_start, ws.max_row + 1): name = ws.cell(r, name_col).value serial = ws.cell(r, serial_col).value if name is None or str(name).strip() == "": continue key = re.sub(r'[\s\u3000]+', '', str(name).strip()) dups[key].append((r, serial, name)) dup_items = {k: v for k, v in dups.items() if len(v) > 1} # --- Styles --- # highlight: light red fill similar to Excel "Light Red Fill with Dark Red Text" dup_fill = PatternFill(fill_type="solid", fgColor="FFC7CE") dup_font = Font(color="9C0006") # dark red # Mark column header ws.cell(header_row, mark_col).value = "姓名重复" # Copy header style from previous visible header cell (prefer the one right before mark_col if exists) src_header_cell = ws.cell(header_row, max(1, mark_col - 1)) tgt_header_cell = ws.cell(header_row, mark_col) tgt_header_cell.font = copy(src_header_cell.font) tgt_header_cell.fill = copy(src_header_cell.fill) tgt_header_cell.border = copy(src_header_cell.border) tgt_header_cell.alignment = copy(src_header_cell.alignment) tgt_header_cell.number_format = src_header_cell.number_format tgt_header_cell.protection = copy(src_header_cell.protection) # Ensure mark column cells match data row style (copy from previous column in same row) for r in range(data_start, ws.max_row + 1): # only style rows that appear to have data in serial or name if ws.cell(r, serial_col).value is None and ws.cell(r, name_col).value is None: continue src = ws.cell(r, max(1, mark_col - 1)) tgt = ws.cell(r, mark_col) tgt.font = copy(src.font) tgt.fill = copy(src.fill) tgt.border = copy(src.border) tgt.alignment = copy(src.alignment) tgt.number_format = src.number_format tgt.protection = copy(src.protection) # Mark duplicates and highlight for key, rows in dup_items.items(): for (r, serial, name) in rows: ws.cell(r, mark_col).value = "重复" # highlight serial and name for c in (serial_col, name_col): cell = ws.cell(r, c) cell.fill = dup_fill # Keep existing font bold/size but make text dark red for visibility new_font = copy(cell.font) new_font.color = dup_font.color cell.font = new_font # --- Add a summary sheet --- summary_name = "重复姓名汇总" if summary_name in wb.sheetnames: del wb[summary_name] sum_ws = wb.create_sheet(summary_name) sum_ws.append(["姓名(去空格)", "出现次数", "序号列表", "行号列表", "姓名原样(示例)"]) for key in sorted(dup_items.keys(), key=lambda x: x): rows = dup_items[key] serials = [str(s) for (_, s, _) in rows if s is not None] rownums = [str(r) for (r, _, _) in rows] samples = sorted({str(n) for (_, _, n) in rows if n is not None}) sum_ws.append([key, len(rows), "、".join(serials), "、".join(rownums), ";".join(samples)]) # Format summary header hdr_fill = PatternFill(fill_type="solid", fgColor="D9E1F2") hdr_font = Font(bold=True) for c in range(1, 6): cell = sum_ws.cell(1, c) cell.fill = hdr_fill cell.font = hdr_font sum_ws.freeze_panes = "A2" # Set column widths for readability widths = [18, 10, 30, 22, 30] for i, w in enumerate(widths, start=1): sum_ws.column_dimensions[get_column_letter(i)].width = w # Save output out_path = "/mnt/data/2025年度企业参与研发项目科技人员汇总表_姓名重复已标注.xlsx" wb.save(out_path) # Build a DataFrame for quick display in chat UI (one row per duplicated name) rows_for_df = [] for key, rows in sorted(dup_items.items(), key=lambda kv: kv[0]): rows_for_df.append({ "姓名(去空格)": key, "出现次数": len(rows), "序号列表": "、".join([str(s) for (_, s, _) in rows if s is not None]), }) df = pd.DataFrame(rows_for_df) from caas_jupyter_tools import display_dataframe_to_user display_dataframe_to_user("姓名重复汇总(按“姓名”去空格后识别)", df) out_path '/mnt/data/2025年度企业参与研发项目科技人员汇总表_姓名重复已标注.xlsx'

Python

python
from artifact_tool import SpreadsheetArtifact out_path="/mnt/data/2025年度企业参与研发项目科技人员汇总表_姓名重复已标注.xlsx" artifact = SpreadsheetArtifact.load(out_path) artifact.recalculate() artifact.save(out_path) # render first sheet and summary for quick visual inspection rendered = artifact.render(sheet_name="Sheet1") rendered2 = artifact.render(sheet_name="重复姓名汇总") rendered, rendered2[:2] if isinstance(rendered2, list) else type(rendered2)

Python

python
from artifact_tool import SpreadsheetArtifact out_path="/mnt/data/2025年度企业参与研发项目科技人员汇总表_姓名重复已标注.xlsx" artifact=SpreadsheetArtifact.load(out_path) artifact.calculate() exported=artifact.export(out_path, overwrite=True) exported PosixPath('/mnt/data/2025年度企业参与研发项目科技人员汇总表_姓名重复已标注.xlsx')

Python

python
sheet=artifact.sheet("Sheet1") rng=sheet.minimum_range() img_path=rng.render() img_path

Python

python
sheet=artifact.sheet("Sheet1") img_path=sheet.range("A1:O40").render() img_path PosixPath('/tmp/granola_render_spreadsheet6959gelp/Sheet1__range-A1-O40.png')

Python

python
img2=artifact.sheet("重复姓名汇总").range("A1:E25").render() str(img2) '/tmp/granola_render_spreadsheet4a6essx4/___range-A1-E25.png'

Python

python
img2=artifact.sheet("重复姓名汇总").range("A1:E25").render() str(img2) '/tmp/granola_render_spreadsheet4a6essx4/___range-A1-E25.png' <<ImageDisplayed>>

Python

python
dup_name_count=len(dup_items) dup_row_count=sum(len(v) for v in dup_items.values()) dup_name_count, dup_row_count (63, 159)

विचार

dup_name_count=len(dup_items) dup_row_count=sum(len(v) for v in dup_items.values()) dup_name_count, dup_row_count

(63, 159)

Thought for 3m 30s

उत्तर

已在你上传的表里把**“姓名”列重复对应的“序号”**找出来并完成标注了(按“姓名去掉空格/全角空格后”识别重复)。

  • Sheet1:新增了 “姓名重复”(第 O 列)
    • 该行姓名重复的,填 “重复”
    • 同时把该行的 序号姓名 单元格用浅红色高亮标注
  • 另外新增了一个汇总表 “重复姓名汇总”:按重复姓名列出出现次数、序号列表等

共识别到 63 个重复姓名,涉及 159 行被标注。

下载已标注文件

इस प्रश्नोत्तर को साझा करें