<div data-theme-toc="true"> </div> ## 模型区分题库(逻辑类)...

تم الإنشاء في: ١٢ يوليو ٢٠٢٥

تمت الإجابة باستخدام O3 بواسطة Chat01

سؤال

<div data-theme-toc="true"> </div>

模型区分题库(逻辑类)

序号题型题目_________________________________✔️________________________________答案
1排列组合将 6 个数 2,0,1,9,20,19 按任意次序排成一行, 拼成一个 8 位数 (首位不为 0),则产生的不同的 8 位数的个数为🌌GK4 🧠o3p 🪐neptune-v31🧊K2 🎵1.6T(不稳定) 🔗o3(不稳定) 💫G-2.5P 💨o4m 📸G-2.5FL498498
2数列设实数列 $\{x_n\}$ 满足:$x_0 = 0$,$x_2 = \sqrt[3]{2}x_1$,$x_3$ 是正整数,且 $$x_{n+1} = \frac{1}{\sqrt[3]{4}} x_n + \sqrt[3]{4} x_{n-1} + \frac{1}{2} x_{n-2} (n \geq 2).$$ 问:这类数列中最少有多少个整数项? 🔗o3 🧠o3p🌌GK4(不稳定),📸G-2.5FL🎵1.6T,🌕Opus1,🌙Sonnet1,🐳DSR1,🐋DSV3,⚡G-2.5F(不稳定),💫G-2.5P(不稳定),🌈4o,💨o4m,💡o4mh,❓Q3-23555
3布尔代数将与或式 $ ABC + \overline{A} \cdot \overline{B} \cdot \overline{C} $ 转换为与非 - 与非式🔗o3💫G-2.5P(不稳定)📸G-2.5FL(不稳定)ABCABC\overline{\overline{ABC} \cdot \overline{\overline{A} \cdot \overline{B} \cdot \overline{C}}}
4立体几何一个棱长为30厘米的立方体铁块,从8个角各去掉一个棱长10厘米的立方体铁块。然后放入一个底面积为2500平方厘米,原本盛有20厘米水的容器。放入后水位是多少厘米🐳DSR1,🧠o3p,🪐neptune-v31🧊K2 📸G-2.5FL🌕Opus1,🌙Sonnet1,🔗o3(不稳定),🐋DSV3,⚡G-2.5F(不稳定),💫G-2.5P(不稳定),🌈4o,💨o4m,💡o4mh,❓Q3-23527厘米
5解析几何在平面四边形ABCD中,AB = AC = CD = 1,\angle ADC = 30^{\circ},\angle DAB = 120^{\circ}。将\triangle ACD沿AC翻折至\triangle ACP,其中P为动点。 求二面角A - CP - B的余弦值的最小值。🔗o3,🎵1.6T,🌙Sonnet1,🐳DSR1,💫G-2.5P,💡o4mh,🪐neptune-v31📸G-2.5FL🌕Opus(不稳定),🌈4o,🐋DSV3,⚡G-2.5F(不稳定),❓Q3-235(不稳定),💨o4m33\frac{\sqrt{3}}{3}
6递推数列设有理数数列 $x_1, x_2, \dots$ 定义如下:$x_1 = \frac{25}{11}$,且对于所有 $k$ 有 $$ x_{k+1} = \frac{1}{3}\left(x_k + \frac{1}{x_k} - 1\right). $$ 其中 $x_{2025}$ 可以表示为互质正整数 $m$ 和 $n$ 的分数 $\frac{m}{n}$。求 $m+n$ 除以 $1000$ 的余数。⚡G-2.5F,💫G-2.5P,🔗o3,💨o4m,💡o4mh📸G-2.5FL(不稳定)🎵1.6T(不稳定),🌕Opus(不稳定)1,🌙Sonnet1,🐳DSR1,🌈4o,❓Q3-235248248
7解析几何已知过点 $A(-1, 0)$ 、 $B(1, 0)$ 两点的动抛物线的准线始终与圆 $x^2 + y^2 = 9$ 相切,该抛物线焦点 $P$ 的轨迹是某圆锥曲线 $E$ 的一部分。<br>(1) 求曲线 $E$ 的标准方程;<br>(2) 已知点 $C(-3, 0)$ , $D(2, 0)$ ,过点 $D$ 的动直线与曲线 $E$ 相交于 $M$ 、 $N$ ,设 $\triangle CMN$ 的外心为 $Q$ , $O$ 为坐标原点,问:直线 $OQ$ 与直线 $MN$ 的斜率之积是否为定值,如果为定值,求出该定值;如果不是定值,则说明理由。🎵1.6T,🌙Sonnet1,🐳DSR1,💫G-2.5P,🔗o3,💨o4m,💡o4mh📸G-2.5FL(不稳定)🌈4o,⚡G-2.5F,❓Q3-235(不稳定)x29+y28=1,5\frac{x^2}{9} + \frac{y^2}{8} = 1, -5
8不等式给定不小于3的正整数 $ n $,求最小的正数 $\lambda$,使得对于任何 $\theta_i \in (0, \frac{\pi}{2}) $ ($i = 1, 2, \cdots, n$),只要 $\tan \theta_1 \cdot \tan \theta_2 \cdots \cdot \tan \theta_n = 2^{\frac{n}{2}}$,就有 $\cos \theta_1 + \cos \theta_2 + \cdots + \cos \theta_n$ 不大于 $\lambda$。 📸G-2.5FL🧠o3p,🎵1.6T,🐳DSR1,⚡G-2.5F,❓Q3-235,💨o4m,💡o4mh🌙Sonnet1,🌈4o,🐋DSV3,🔗o3(不稳定),💫G-2.5P(不稳定)n1n-1
9逻辑推理Sroan 有一个私人的保险箱,密码是 7 个 不同的数字。 Guess #1: 9062437 Guess #2: 8593624 Guess #3: 4286915 Guess #4: 3450982 Sroan 说: 你们 4 个人每人都猜对了位置不相邻的两个数字。 (只有 “位置及其对应的数字” 都对才算对) 问:密码是什么?🌙Sonnet1,🐳DSR1,⚡G-2.5F,💫G-2.5P,🔗o3,💨o4m,💡o4mh📸G-2.5FL🌈4o,🐋DSV3,❓Q3-23540539274053927
10支持向量机已知正例点:$\mathbf{x}_1 = (1,2)^T, \quad \mathbf{x}_2 = (2,3)^T, \quad \mathbf{x}_3 = (3,3)^T$ 负例点:$\mathbf{x}_4 = (2,1)^T, \quad \mathbf{x}_5 = (3,2)^T$ 试求最大间隔分离超平面,并指出所有的支持向量。🌙Sonnet1,🐳DSR1,⚡G-2.5F,💫G-2.5P,🔗o3,💨o4m,💡o4mh🌈4o,🐋DSV3,❓Q3-235(不稳定)📸G-2.5FL(不稳定)最大间隔分离超平面为: x1+2x22=0-x_1 + 2 x_2 - 2 = 0 支持向量为: x1=(1,2)Tx3=(3,3)Tx5=(3,2)T\begin{aligned} \mathbf{x}_1 &= (1,2)^T \\ \mathbf{x}_3 &= (3,3)^T \\ \mathbf{x}_5 &= (3,2)^T \end{aligned}
11电子技术基础已知8段共阳极LED数码管要显示字符“5”(a段为最低位),此时的段码为 _______。🌙Sonnet1,🐳DSR1,🐋DSV3,⚡G-2.5F,💫G-2.5P,🔗o3,💨o4m,💡o4mh,❓Q3-235🌈4o,📸G-2.5FL(不稳定)92H92H
12变质量动力学雨滴开始自自由下落时质量为 $m_0$。在下落过程中,单位时间凝聚的水汽质量为 $\lambda$($\lambda$为常量)。试求雨滴经过时间 $t$下落的距离。忽略空气阻力。重力加速度为$g$。🌙Sonnet1,🐳DSR1,🐋DSV3,⚡G-2.5F,💫G-2.5P,💨o4m,💡o4mh,❓Q3-235📸G-2.5FL🌈4os(t)=gt24+gm0t2λgm022λ2ln(1+λtm0)s(t) = \frac{g t^{2}}{4} + \frac{g m_{0} t}{2 \lambda} - \frac{g m_{0}^{2}}{2 \lambda^{2}} \ln\left(1 + \frac{\lambda t}{m_{0}}\right)
13解析几何在平面直角坐标系中,函数 ( y = \frac{x+1}{|x|+1} ) 的图像上有三个不同的点位于直线上,且这三点的横坐标之和为 0。求 ( l ) 的斜率的取值范围。🌙Sonnet1,🐳DSR1,🐋DSV3,⚡G-2.5F,💫G-2.5P,💨o4m,💡o4mh,❓Q3-235📸G-2.5FL🌈4o0<k<290 < k < \frac{2}{9}
14几何在正四棱台 $ABCD-A_1B_1C_1D_1$ 中,$AB=2$,$A_1B_1=1$,$AA_1=\sqrt{2}$,则该棱台的体积为多少?🌙Sonne1,🐳DSR1,🐋DSV3,⚡G-2.5F,💫G-2.5P,💨o4m,💡o4mh,❓Q3-235📸G-2.5FL🌈4o(不稳定)766\frac{7\sqrt{6}}{6}
15排列问题有 8 个人,分别是 A、B、C、D 和另外 4 人。要将这 8 个人随机安排在教室的两排座位上,每排有 4 个座位,共 8 个座位。相邻的定义是:若两个人坐在同一排并且座位编号相邻,则这两个人相邻。现要求 A 与 B 必须相邻,且 C 与 D 不相邻,问在上述条件下共有多少种不同的排法?🌙Sonnet1,🐳DSR1,🐋DSV3,⚡G-2.5F,💫G-2.5P,🔗o3,💨o4m,💡o4mh,❓Q3-235📸G-2.5FL🌈4o65286528
16几何在$\Delta ABC$中,$\angle A$、$\angle B$、$\angle C$所对的边分别为$a, b, c$,且$c=10$,$\frac{\cos A}{\cos B} = \frac{b}{a} = \frac{4}{3}$,$P$为$\Delta ABC$内切圆上的动点,求点$P$到顶点$A$、$B$、$C$的距离的平方和的最大值和最小值。🧊K2 🌙Sonnet1 🐳DSR1 🐋DSV3 💫G-2.5P ⚡G-2.5F 🔗o3 💨o4m 💡o4mh ❓Q3-235 📸G-2.5FL🌈4o88,7288, 72
17单片机定时器AT89S51采用6MHz的晶振,定时2ms,如用定时器方式1时的初值(16进制数)应为多少?(写出计算过程)🧊K2 🌌GK4,🌙Sonnet1,🌈4o,🐳DSR1,🐋DSV3,💫G-2.5P,⚡G-2.5F,🔗o3,💨o4m,💡o4mh,❓Q3-235📸G-2.5FL0xFC180xFC18

模型区分题库(知识储备类)

序号题目答案
1金瓶梅最后一回的名称是什么韩爱姐路遇二捣鬼 普静师幻度孝哥儿
2披发左衽的意思是?非汉族习俗

识图能力测试

[details="识图题库"]

序号题目答案
1提取图中文字 image|399x500不想上班,那就不上
2提取图中文字 图片三bsjx
3求解 图片四DC = 30/7
4提取图中文字 图片二4yu6
5提取图中文字 图片一rpmx
6提取图中文字 image|690x183真诚、友善、团结、专业,共建你我引以为荣之社区
7回答问题 image|220x202第 3 张和第 4 张
8提取图片中的文字PixPin_2025-04-18_11-39-55|573x41x^2+y^2+z^2=9 及 z-x=2 的交线在 y=z 面上的投影方程?
9提取图片中的文字image|690x98L = A\overline{B}CD + D(\overline{B}\overline{C}D) + (A + C)B\overline{D} + \overline{A}\overline{(\overline{B} + C)}

[/details]

测试模型

[details="测试模型(按名称首字母排序)"]

序号图标模型名称缩写API
1🌕Claude Opus 4 ExtendedOpus
2🌙Claude Sonnet 4 ExtendedSonnet
3🐳DeepSeek-R1DSR1
4🐋DeepSeek-V3DSV3
5🎵Doubao-Seed-1.6-thinking1.6T
6gemini-2.5-flash-preview-05-20G-2.5F
7💫gemini-2.5-proG-2.5P
8📸gemini-2.5-flash-lite-preview-O6-17(Thinking)G-2.5FL
9🌈GPT-4o4o
10🌟Grok-3 Reasoning BetaGK3R
11🌌grok-4-0709GK4
12🧊K2K2
13🔗o3o3
14🧠o3 proo3p
15💨o4-minio4m
16💡o4-mini-higho4mh
17Qwen3-235B-A22BQ3-235

[/details]

语言模型平台汇总

[details="语言模型平台汇总(按名称首字母排序)"]

序号Logo网站
1aistudio|56x56Google AI Studio
2azureai|56x56Azure AI
3chatbotarena|56x56Chatbot Arena
4chatgpt|56x56ChatGPT
5claude|56x56Claude
6copilot|56x56Microsoft Copilot
7deepseek|56x56DeepSeek
8doubao|56x56豆包
9gemini|56x56Gemini
10grok|56x56Grok
11groq|56x56Groq
12huggingface|56x56HuggingChat
13kimi|56x56Kimi
14lambda|56x56Lambda
15meta|56x56Meta AI
16mistral|56x56Mistral
17openrouter|56x56OpenRouter
18perplexity|56x56Perplexity
19poe|56x56Poe
20qingyan|56x56智谱清言
21qwen|56x56Qwen
22siliconcloud|56x56硅基流动
23together|56x56Together AI
24wenxin|56x56文心一言
25icons8-x|56x56X
26yuanbao|56x56腾讯元宝

[/details]

开放 Wiki编辑建议

<mark>开放 Wiki,大家一起编辑</mark> 建议:

  1. 使用相对权威的平台的语言模型测试,而不是明显削弱过的语言模型。
  2. 测试后发送截图证明测试结果。
  3. 一道题一个模型<mark>至少测试5次</mark>再定结果。
  4. 准确率<mark> ≥80% </mark>放入 ✔️ 列,<mark> 40%-60% </mark>放 ❌ 列且标注“(不稳定)”。
  5. 模型使用默认设置。
  6. 用模型的名称排序表格中的模型顺序。

LiveBench

<iframe height="500" width="100%" style="border: none;" title="CodePen 示例" src="https://codepen.io/Heye-Jones/embed/preview/Pwogbwz?default-tab=result&theme-id=light&editable=true" loading="lazy" allowfullscreen="true" scrolling="no" > </iframe>

LiveCodeBench

<iframe height="500" width="100%" style="border: none;" title="CodePen 示例" src="https://codepen.io/Heye-Jones/embed/preview/LEVNGgV?default-tab=result&theme-id=light&editable=true" loading="lazy" allowfullscreen="true" scrolling="no" > </iframe>

Aider LLM Leaderboards

<iframe height="500" width="100%" style="border: none;" title="CodePen 示例" src="https://codepen.io/Heye-Jones/embed/preview/emNEdEx?default-tab=result&theme-id=light&editable=true" loading="lazy" allowfullscreen="true" scrolling="no" > </iframe>

综合来看,最强的唯一是哪一个?

Footnotes

  1. Claude Thinking API 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Python

python
from collections import Counter,defaultdict,OrderedDict tasks = { 1:["GK4","o3p","neptune-v3"], 2:["o3","o3p"], 3:["o3"], 4:["DSR1","o3p","neptune-v3"], 5:["o3","1.6T","Sonnet","DSR1","G-2.5P","o4mh","neptune-v3"], 6:["G-2.5F","G-2.5P","o3","o4m","o4mh"], 7:["Sonnet","DSR1","G-2.5P","o3","o4m","o4mh"], 8:["G-2.5FL","o3p","1.6T","DSR1","G-2.5F","Q3-235","o4m","o4mh"], 9:["Sonnet","DSR1","G-2.5F","G-2.5P","o3","o4m","o4mh"], 10:["Sonnet","DSR1","G-2.5F","G-2.5P","o3","o4m","o4mh"], 11:["Sonnet","DSR1","DSV3","G-2.5F","G-2.5P","o3","o4m","o4mh","Q3-235"], 12:["Sonnet","DSR1","DSV3","G-2.5F","G-2.5P","o4m","o4mh","Q3-235","G-2.5FL"], 13:["Sonnet","DSR1","DSV3","G-2.5F","G-2.5P","o4m","o4mh","Q3-235","G-2.5FL"], 14:["Sonnet","DSR1","DSV3","G-2.5F","G-2.5P","o4m","o4mh","Q3-235","G-2.5FL"], 15:["Sonnet","DSR1","DSV3","G-2.5F","G-2.5P","o3","o4m","o4mh","Q3-235","G-2.5FL"], 16:["K2","Sonnet","DSR1","DSV3","G-2.5P","G-2.5F","o3","o4m","o4mh","Q3-235","G-2.5FL"], 17:["K2","GK4","Sonnet","4o","DSR1","DSV3","G-2.5P","G-2.5F","o3","o4m","o4mh","Q3-235","G-2.5FL"], } counts=Counter() for t,models in tasks.items(): counts.update(models) counts_sorted = counts.most_common() counts_sorted[:10] [('DSR1', 13), ('o4mh', 13), ('G-2.5P', 12), ('o4m', 12), ('o3', 11), ('Sonnet', 11), ('G-2.5F', 11), ('Q3-235', 8), ('G-2.5FL', 7), ('DSV3', 7)]

الفكرة

Evaluating model counts and stability

So, DSR1 and o4mh each show up in 13 tasks, with G-2.5P appearing in 12 tasks. However, DSR1 seems stable, always in the correct ✔️ lists and never in the incorrect ❌ lists. In contrast, o4mh was occasionally marked wrong. Based on consistent appearances and no instability for DSR1, it might be the most reliable model overall. This suggests that even though the counts are high for DSR1 and o4mh, DSR1's stability gives it an edge.

Thought for 1m 23s

الإجابة

综合汇总 17 道逻辑/推理题的得分情况:

模型✔️ 次数出现在 ❌ 列稳定性
DeepSeek-R1 (DSR1)130始终列在正确列,未标 “(不稳定)”
o4-mini-high (💡o4mh)132(序号 1、4)有错误记录
Gemini 2.5 Pro (💫G-2.5P)124(序号 1–4)多次标 “(不稳定)”
o4-mini (💨o4m)123有错误记录
o3 110正确题数稍少
Claude Sonnet110正确题数稍少
  • DeepSeek-R1 与 o4-mini-high 并列最高正确数(13 题),但 o4-mini-high 在两题中被列入错误栏,而 DeepSeek-R1 从未出现在任何错误/“不稳定”栏。
  • 其余模型不是正确题数稍低,就是同时带有“不稳定”或错误记录。

因此,综合表现最强且唯一能兼顾高正确率与稳定性的是 DeepSeek-R1。

مشاركة هذا السؤال والجواب