详细解释这几个文件的功能?其中哪几个能做到前16字节不同并且前16字节都是可打印字符的碰撞/

생성일: 2026년 3월 2일

답변에 사용된 모델 GPT-5.2 Thinking by Chat01

질문

README.md

cpc.sh

fastcpc.sh

generic_ipc.sh

textcoll.sh

详细解释这几个文件的功能?其中哪几个能做到前16字节不同并且前16字节都是可打印字符的碰撞/

사고

Explaining script functions in Chinese and identifying collision possibilities

The user's asking about the scripts, specifically which of them can generate collisions where the first 16 bytes differ and are printable. Both cpc.sh and fastcpc.sh accept two prefix files with different prefixes, ensuring printable bytes. I’ll explain these functions in Chinese and clarify how each script handles this. Textcoll.sh ensures printable characters for collisions but uses the same prefix for both blocks.

Understanding byte differences in textcoll.sh

In textcoll.sh, the prefix file is optional, with a default dummy. It defines specific byte values at certain positions (like byte 21), causing differences that lead to collisions. The first 16 bytes aren't affected by this difference, but it can be adjusted by editing the script to focus on earlier bytes if needed.

Clarifying byte boundaries in text collision

It looks like the first 21 bytes in both final messages are likely identical ("TEXTCOLLBYMarcSt"). However, there seems to be a conflict where byte 20 is set to different values in different blocks – 'h' in one and 's' in another. This could be due to the way MD5 handles block boundaries, where the first 64-byte block doesn't always align perfectly with the second.

Clarifying block byte positions in text collision

It looks like they refer to bytes within specific 64-byte blocks. The first block's bytes seem to be fixed for certain positions, like byte 0-7. The second block's bytes, from byte 8 to 20, are set as well, while the first 8 bytes of the second block remain unconstrained, but with characters from the alphabet.

Clarifying collision structure for message prefixes

It seems that the first 16 bytes of both messages are mostly the same, with bytes 0-7 fixed to “TEXTCOLL” and bytes 8-15 variable but drawn from an alphabet. While there is a difference in byte 21 (due to a +4 difference), the first 16 bytes are not different between the two messages. The user wants those first 16 bytes to differ while being printable characters.

Explaining prefix collision scripts and adjustments

The default textcoll.sh doesn't make the first 16 bytes different, as they are likely identical. To do this, you'd have to customize the differential path and byte constraints, which is more advanced. Meanwhile, cpc.sh and fastcpc.sh can explicitly handle different prefixes to change the first 16 bytes.

Explaining chosen-prefix collisions, dependencies, and file details

cpc.sh and fastcpc.sh both perform chosen-prefix collisions, with fastcpc.sh being more advanced, using precomputed files and possibly GPU for speed. The README explains how to use them and their dependencies like BINDIR. Also, we need to explain which scripts can produce collisions with specific properties, such as having printable first 16 bytes, by customizing prefix files.

Explaining textcoll, chosen-prefix collisions, and ethical considerations

Textcoll enforces printability via the ALPHABET and predefined bytes, but defaults to differences at byte21, so the first 16 bytes may remain the same. For chosen-prefix collisions, if the prefix differs, the collision outputs can meet the requirement. It's crucial to note the ethical use and not bypass security measures.

Explaining chosen-prefix collisions and script capabilities

In short: md5_birthdaysearch likely processes file1 and file2 into 'file1.bin' and 'file2.bin', with added data but starting from the input bytes. It should preserve the first 16 bytes, but we can't be 100% sure without testing. For the user’s query, cpc.sh and fastcpc.sh can create the desired collisions.

Summarizing script functionalities and usage

README.md covers build requirements and script usage for chosen-prefix, identical-prefix, and text collisions.
cpc.sh controls chosen-prefix collision, using birthday search and iterative block computations.
fastcpc.sh optimizes the collision process, supporting multi-node setups.
generic_ipc.sh handles identical-prefix collisions, checking for multiple diff patterns.
textcoll.sh generates printable text collisions, ensuring specific byte differences.

Thought for 4m 5s

답변

下面按文件逐个说明它们“在做什么 / 输入输出是什么 / 关键限制或可调点”，最后回答你问的“前 16 字节不同且都可打印”的碰撞到底哪些脚本能直接做到。

README.md：项目总说明 + 三类“生成碰撞”的入口

README 主要告诉你 HashClash 是一个 MD5/SHA-1 的密码分析工具箱，以及如何编译、以及用脚本生成不同类型的碰撞：

Chosen-Prefix Collision（CPC，选择前缀碰撞）：给两个不同前缀文件，跑 ../scripts/cpc.sh <prefix1> <prefix2>。(README.md)
Identical-Prefix Collision（IPC，相同前缀碰撞）：给一个前缀文件，跑 ../scripts/generic_ipc.sh prefix.txt；并且强调 prefix 文件长度最好是 64 字节倍数（可再加少量字节用于强制 message words）。(README.md)
Text collision（可打印文本风格的碰撞）：跑 ../scripts/textcoll.sh，并提示可以改 alphabet、强制特定字节等。(README.md)

cpc.sh：通用“选择前缀碰撞”流水线（偏稳健，带回退/超时）

目标：对你提供的两个前缀文件（file1/file2），生成一对 .coll 文件，使它们 MD5 相同，但内容不同（chosen-prefix collision）。

它做的核心事（脚本把各个 HashClash 二进制工具串起来）：

准备环境与并行度：定位 ../bin 下的工具（md5_birthdaysearch、md5_diffpath* 等），并根据 CPU 核数设定线程数。(cpc.sh)
生日搜索（birthday search）阶段：对两个输入前缀做 md5_birthdaysearch --inputfile1 ... --inputfile2 ...（脚本会先“探测 worklevel”，决定参数更偏 GPU/内存哪一套）。(cpc.sh)
迭代生成 near-collision block：每一轮 dostepk 会：
- md5_diffpathhelper --startnearcollision file1_k.bin file2_k.bin 启动近碰撞块搜索 (cpc.sh)
- forward / backward 扩展差分路径，再 connect 连接，再 --findcoll 找到一对 coll1_* / coll2_*，把它们分别拼到 file1_(k+1).bin / file2_(k+1).bin (cpc.sh)
回退与自动杀进程：如果某一步超时或卡住，会标记 killed 并 backtrack；最终生成 ${file1}.coll 与 ${file2}.coll，并检查 MD5 是否相同。(cpc.sh)

输出：${file1}.coll、${file2}.coll（脚本会打印它们的 md5sum 并提示成功）。(cpc.sh)

fastcpc.sh：更“工程化/加速”的选择前缀碰撞（预计算 + 超时策略）

目标：同样是 chosen-prefix collision（两个前缀输入，输出一对 MD5 相同的文件），但更强调速度与可重复性：预计算 upper paths、设置每个 near-collision block 的超时，卡住就直接 abort。

关键点：

大量参数化：MAXBLOCKS / MAXMEMORY / UPPERPATHCOUNT / BLOCK_TIMEOUT 等；并且说明了怎么用日志统计 near-collision block 时长来调 BLOCK_TIMEOUT。(fastcpc.sh)
预计算 upper path：precompute() 会为不同 diff 预先生成 paths(TTT+4)… 文件，不存在就算出来并缓存。(fastcpc.sh)
运行入口：Usage: $0 <prefixfile1> <prefixfile2>，并明确是 “START CHOSEN-PREFIX COLLISION ATTACK”。(fastcpc.sh)
每个 near-collision block 的工作流：先 --startnearcollision，再 forward，同时后台 connect，等 bestpaths.bin.gz 出来后再 --findcollision，找到 coll1_* 就拼接到下一轮文件；若等待超过 BLOCK_TIMEOUT 就 killall ... 并退出。(fastcpc.sh)

输出：类似 file1_(i+1).bin / file2_(i+1).bin，一旦 md5 一致就退出并报告成功。(fastcpc.sh)

generic_ipc.sh：通用“相同前缀碰撞”（一个前缀，输出 collision1/2.bin）

目标：生成 identical-prefix collision：两份消息以同一个 prefix 开头（脚本从同一个 prefixfile 建 base），后面拼接不同的块，最终 MD5 相同。

你需要注意它的结构：

只吃一个 prefixfile=$1，缺省就用 dummy.prefix.bin。(generic_ipc.sh)
内置了三套差分参数（N=1/2/3），通过 diff="--diffm..." 选择不同攻击配置。(generic_ipc.sh)
第一段 near-collision：从 prefix 出发 --startpartialpathfromfile --inputfile1 $prefixfile --outputfile1 data/$prefixfile.base ...（注意它把 prefix“规范化/保存”为 .base，后续两条消息都从这个 base 拼起）。(generic_ipc.sh)
两次 connect + 两次 --findcollision，最后拼出 collision1.bin / collision2.bin 并检查 md5 是否真一致。(generic_ipc.sh)

输出：collision1.bin、collision2.bin，并打印 md5sum/sha1sum（sha1 只是对比用，不代表也发生 sha1 碰撞）。(generic_ipc.sh)

textcoll.sh：生成“可打印文本”的 MD5 碰撞（带字母表/强制字节）

目标：生成两条“看起来像文本字符串”的碰撞结果（脚本输出 final_collision1.txt/final_collision2.txt），并强约束很多字节都来自可打印字符集合。

关键机制：

定义 ALPHABET（可打印字符集合），并把搜索限定在这个集合里。(textcoll.sh)
通过 FIRSTBLOCKBYTES/SECONDBLOCKBYTES 强制某些字节为指定字符（例如开头固定成 TEXTCOLL... 之类）。(textcoll.sh)
重要限制：脚本明确写了 byte21 有 +4 的差分，所以你必须确保 alphabet 里包含一对相差 4 的字符（示例 A/E）。(textcoll.sh)
三阶段：prepare → firstblock → secondblock，最终拼出两份 final_collision*.txt 并打印 md5sum/sha1sum，还会把两个文本内容直接 cat 出来。(textcoll.sh)
README 里的例子也展示了两条文本 MD5 相同，但在中间某个字符处不同（Ac vs Ec）。(README.md)

你要的“前 16 字节不同 + 前 16 字节都可打印”的碰撞：哪些脚本能做到？

✅ 能直接做到：cpc.sh、fastcpc.sh

原因很直接：它们是 chosen-prefix collision，输入就是两个不同的前缀文件（<prefixfile1> <prefixfile2>）。(README.md) (fastcpc.sh)
所以只要你自己保证两个前缀文件的前 16 字节：

都是可打印字符
且两者不同
那么生成出的 .coll（或 file1_*.bin/file2_*.bin）自然就满足“开头 16 字节不同且可打印”（至少开头部分继承你的前缀；建议你生成后用 hexdump 自查）。

注意：这两个脚本并不会替你“强制可打印”，它们只把你给的前缀当输入来做 CPC。

❌ 做不到（至少不是“开箱即用”）：generic_ipc.sh

它是 identical-prefix collision：只有一个 prefixfile，并且两条消息都从同一个 data/$prefixfile.base 拼起来。(generic_ipc.sh)
因此两份输出的前 16 字节必然相同，不可能满足“前 16 字节不同”。

❌ 默认不满足你的“前 16 字节不同”：textcoll.sh

它确实把大量字节限制在 ALPHABET（可打印字符）里。(textcoll.sh)
但它脚本设计的“可控差异点”在 byte21（第 22 个字节），并明确写了 byte21 的 +4 差分要求。(textcoll.sh)
README 的示例也显示差异出现在更靠后的位置（不是前 16 字节）。(README.md)

如果你一定要“差异发生在前 16 字节”，那属于改攻击约束/差分路径的范畴，textcoll.sh 目前这套配置并不是为此设计的（不是改个参数就自然满足）。

如果你愿意，我也可以按你说的“前 16 字节都可打印且不同”的目标，帮你判断更具体应该选 cpc.sh 还是 fastcpc.sh（取决于你机器是否有 CUDA、能接受的失败重跑方式、是否想要预计算缓存等），但不会涉及把它用在未授权场景里。

질문

Found collision!
ea7a96e3390f33b300b062e13efd25b3 collision1.bin
ea7a96e3390f33b300b062e13efd25b3 collision2.bin
8b94e4030b9e8db69bd1990a9208b43d763693b2 collision1.bin
19dabafb6c20a557c43dca73664807516c6f087f collision2.bin
4 -rw-rw-r-- 1 kali kali 128 3月 3日 03:40 collision1.bin
4 -rw-rw-r-- 1 kali kali 128 3月 3日 03:40 collision2.bin

┌──(kali㉿kali)-[~/Desktop/misc/hashclash-static-release-v1.2b/scripts]
└─$ xxd -l 16 collision2.bin
00000000: 4142 4344 4546 4748 494b 4b4c 4d4e 4f50 ABCDEFGHIKKLMNOP

┌──(kali㉿kali)-[~/Desktop/misc/hashclash-static-release-v1.2b/scripts]
└─$ xxd -l 16 collision1.bin
00000000: 4142 4344 4546 4748 494a 4b4c 4d4e 4f50 ABCDEFGHIJKLMNOP
可是我不这样认为

사고

Crafting response and validating collision output

I don't need additional citations; we already have all the relevant lines. I'll start by confirming the user’s output: the MD5 matches, but SHA1 differs, confirming the collision is only for MD5. I’ll explain that the differing first 16 bytes are printable ASCII. Then, I'll explain how the script generates the collision, based on the empty prefix and the concatenation process. I’ll point to the relevant script lines for clarity.

Clarifying base and prefix behavior

I think I figured it out: If the base file is empty, the first bytes come from the collision blocks, so they can differ. My earlier claim about identical first 16 bytes depends on the prefix length. If it’s shorter than 16 bytes, differences may appear in the first 16 bytes. The 128-byte output strongly suggests the prefix is empty. I’ll explain this reasoning in a clear way.

Clarifying the process and limitations

I think I’ve figured out the reasoning: To confirm, check the prefix file size with ls -l dummy.prefix.bin and compare the first prefix_len bytes of the collision files using cmp -n <prefix_len>. But I want to be cautious — providing collision crafting commands could lead to misuse. I’ll keep things focused on clarification and conceptual info. It’s safest to mention that printability might only apply to the first 16 bytes — the rest likely contains non-printable characters.

Providing citations and clarifying prefix logic

To provide the necessary citations, I’ll mention that the script creates an empty prefix file if it's missing (generic_ipc touches the prefixfile, creating dummy.prefix.bin when absent). I’ll also clarify where prefix base is used, with citations to the corresponding script blocks. My earlier statement about collisions was incorrect — I'll correct it smoothly. I’ll keep the response concise and avoid further instructions on generating collisions, though I can mention that CPC scripts allow for more deterministic control via prefix specification.

Thought for 2m 35s

답변

你贴的结果里，前 16 字节确实不同，而且都是可打印 ASCII；并且这不和 generic_ipc.sh 的“identical-prefix”矛盾。

1) 你这对文件到底差在哪？

xxd -l 16 显示：

collision1.bin: ... 49 4A 4B 4C ...（I J K L）
collision2.bin: ... 49 4B 4B 4C ...（I K K L）

所以第 10 个字节（offset 9，0-based）从 J(0x4A) 变成了 K(0x4B)，因此前 16 字节不同（只差 1 个字节），而且全是 A-Z，当然可打印。

同时：

MD5 一样 → 这就是 MD5 碰撞。
SHA1 不一样 → 很正常，说明两份内容确实不同，只是 MD5 被“撞”到了。

2) 为什么 “identical-prefix collision” 也能出现“前16字节不同”？

关键点：identical-prefix 只保证“你提供的 prefix 文件那一段前缀”相同，并不保证“前 16 字节一定相同”。

而 generic_ipc.sh 有个容易忽略的行为：如果你没给参数，它会用 dummy.prefix.bin，并且如果文件不存在就 touch 一个空文件出来。(generic_ipc.sh)
这意味着：你的“相同前缀”长度可能就是 0 字节（空前缀）。

脚本后面构造输出时，是把 data/$prefixfile.base（也就是 prefix）和碰撞块 coll* 用 cat 拼起来：

先拼出 file1.bin / file2.bin：cat data/$prefixfile.base ./data/coll*_...(generic_ipc.sh)
再拼出最终 collision1.bin / collision2.bin：cat data/file*.bin data/coll* > collision*.bin(generic_ipc.sh)

所以如果 prefixfile.base 是空的，那么文件开头立刻就是碰撞块内容，当然可以在前 16 字节里出现差异——这并不违反“identical-prefix”，因为“prefix”本来就几乎没有。

另外你两份文件大小都是 128 字节，这很像是“只有两个 64B MD5 block”的最短拼接（我这里是基于脚本拼接方式做的推断）。(generic_ipc.sh)

3) 我之前哪里说错了？

我之前把“identical-prefix collision”简化成了“前 16 字节必相同”，这其实隐含了一个前提：prefix 至少有 16 字节。
严格说应该是：前 len(prefix) 字节相同；如果 len(prefix) < 16（尤其是 0），那前 16 字节当然可能不同。

如果你想快速自证“相同前缀长度到底是多少”，最直接是看你当时用的 prefix 文件到底多大（尤其是默认的 dummy.prefix.bin）。脚本也在 README 里提示 prefix 文件通常期望是 64 字节倍数（但它并不会强制你这么做）。(README.md)

질문

사고