现在我有4个R数据框： S_GB：单性状、仅加性效应的预测准确度。 M_GB：多性状、仅加性效应的预...

Question

现在我有4个R数据框：
S_GB：单性状、仅加性效应的预测准确度。
M_GB：多性状、仅加性效应的预测准确度。
S_GBD：单性状、加性+显性效应的预测准确度。
M_GBD：多性状、加性+显性效应的预测准确度。
其中，S_GB，S_GBD三列：k_Fold Acc Trait
M_GB ,M_GBD 一列：ACC，行名为性状:[1] "ph" "ear" "mu" "ph1" "ear1" "mu1" "ph2" "ear2" "mu2"
[10] "ph3" "ear3" "mu3" "ph4" "ear4" "mu4" "ph5" "ear5" "mu5"
[19] "ph6" "ear6" "mu6" "ph7" "ear7" "mu7" "ph8" "ear8" "mu8"
[28] "ph9" "ear9" "mu9" "ph10" "ear10" "mu10" "ph11" "ear11" "mu11"
[37] "ph12" "ear12" "mu12" "ph13" "ear13" "mu13" "ph14" "ear14" "mu14"
[46] "ph15" "ear15" "mu15" "ph16" "ear16" "mu16" "ph17" "ear17" "mu17"
[55] "ph18" "ear18" "mu18" "ph19" "ear19" "mu19" "ph20" "ear20" "mu20"
[64] "ph21" "ear21" "mu21" "ph22" "ear22" "mu22" "ph23" "ear23" "mu23"
[73] "ph24" "ear24" "mu24"
请你据此帮我通过箱线图或其他你认为更好的展示方式为我比较不同条件下的预测结果，图片质量要求美观，符合顶刊发表级别水准。使用R代码实现。

Answer 1

下面给你一套**“可直接跑”的 R 代码，把 4 个数据框统一整理成长表，然后输出 3 张顶刊友好**的图（你可任选其一或同时发表）：

Fig1（推荐）：每个性状在不同条件下的“性状均值准确度”的雨云/小提琴+箱线对比（避免单性状因为有 k-fold 而“样本数更大”导致偏置）
Fig2：按时间点（ph/ear/mu + 编号）画趋势线 + 95%CI（单性状用 k-fold 计算 CI，多性状若只有 1 个值则 CI 为 0）
Fig3：显性效应带来的增益（GBD − GB）的差值分布图（非常适合写结果段落）

假设：

S_GB/S_GBD 列名：k_Fold Acc Trait

M_GB/M_GBD：只有一列 ACC，行名是 Trait

Trait 命名如 ph, ph1...ph24 等

r
## =========================
## 0) Packages
## =========================
suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(stringr)
  library(tibble)
  library(ggplot2)
})

## 如果你希望 PDF 输出更“出版级”，建议装 showtext（可选）
## install.packages("showtext")
## library(showtext)
## showtext_auto()

## 色盲友好 Okabe-Ito 调色（顶刊常用）
okabe_ito <- c(
  "Single (A)"   = "#0072B2",
  "Single (A+D)" = "#D55E00",
  "Multi (A)"    = "#009E73",
  "Multi (A+D)"  = "#CC79A7"
)

## =========================
## 1) Helper functions
## =========================
.get_acc_col <- function(df) {
  nms <- names(df)
  idx <- which(tolower(nms) %in% c("acc", "accuracy"))
  if (length(idx) == 0) stop("找不到 Acc/ACC/Accuracy 列，请检查列名。")
  nms[idx[1]]
}

prep_single <- function(df, approach = "Single", model = "A") {
  stopifnot(all(c("k_Fold", "Acc", "Trait") %in% names(df)))
  df %>%
    transmute(
      Trait = as.character(Trait),
      Fold  = as.factor(k_Fold),
      Acc   = as.numeric(Acc),
      Approach = approach,
      Model = model
    )
}

prep_multi <- function(df, approach = "Multi", model = "A") {
  acc_col <- .get_acc_col(df)
  df %>%
    rownames_to_column("Trait") %>%
    transmute(
      Trait = as.character(Trait),
      Fold  = NA,
      Acc   = as.numeric(.data[[acc_col]]),
      Approach = approach,
      Model = model
    )
}

parse_trait <- function(trait_chr) {
  base <- str_extract(trait_chr, "^[A-Za-z]+")
  tp   <- str_extract(trait_chr, "\\d+$")
  tp_i <- suppressWarnings(as.integer(tp))
  tp_i[is.na(tp_i)] <- 0L  # ph/ear/mu 视为 timepoint=0
  tibble(base_trait = base, timepoint = tp_i)
}

## =========================
## 2) Combine four dataframes into long format
## =========================
acc_long <- bind_rows(
  prep_single(S_GB,  approach = "Single", model = "A"),
  prep_single(S_GBD, approach = "Single", model = "A+D"),
  prep_multi(M_GB,   approach = "Multi",  model = "A"),
  prep_multi(M_GBD,  approach = "Multi",  model = "A+D")
) %>%
  bind_cols(parse_trait(.$Trait)) %>%
  mutate(
    Condition = case_when(
      Approach == "Single" & Model == "A"   ~ "Single (A)",
      Approach == "Single" & Model == "A+D" ~ "Single (A+D)",
      Approach == "Multi"  & Model == "A"   ~ "Multi (A)",
      Approach == "Multi"  & Model == "A+D" ~ "Multi (A+D)"
    ),
    Condition = factor(Condition, levels = names(okabe_ito)),
    base_trait = factor(base_trait, levels = c("ph","ear","mu"))
  ) %>%
  filter(!is.na(Acc))

## =========================
## 3) Trait-level summary (关键：避免 k-fold 让单性状“权重更大”)
## =========================
trait_sum <- acc_long %>%
  group_by(Condition, Approach, Model, Trait, base_trait, timepoint) %>%
  summarise(
    mean_acc = mean(Acc, na.rm = TRUE),
    sd_acc   = sd(Acc, na.rm = TRUE),
    n        = sum(!is.na(Acc)),
    .groups  = "drop"
  ) %>%
  mutate(
    se  = ifelse(is.na(sd_acc) | n <= 1, 0, sd_acc / sqrt(n)),
    ci  = ifelse(n <= 1, 0, qt(0.975, df = n - 1) * se)
  )

## =========================
## 4) Publication theme
## =========================
theme_pub <- theme_classic(base_size = 13) +
  theme(
    legend.title = element_blank(),
    legend.position = "top",
    axis.title.x = element_text(margin = margin(t = 8)),
    axis.title.y = element_text(margin = margin(r = 8)),
    strip.background = element_blank(),
    strip.text = element_text(face = "bold"),
    plot.title = element_text(face = "bold", hjust = 0),
    plot.margin = margin(8, 10, 8, 10)
  )

## =========================
## FIG 1: 推荐——“雨云/小提琴 + 箱线 + 抖动点”
## 每个性状贡献一个值（mean_acc），公平对比四条件
## =========================
p1 <- ggplot(trait_sum, aes(x = Condition, y = mean_acc, fill = Condition)) +
  geom_violin(
    width = 0.95, trim = TRUE, alpha = 0.45,
    color = "grey25", linewidth = 0.35
  ) +
  geom_boxplot(
    width = 0.18, outlier.shape = NA, alpha = 0.75,
    color = "grey15", linewidth = 0.35
  ) +
  geom_jitter(
    aes(color = Condition),
    width = 0.10, height = 0, size = 1.0, alpha = 0.35
  ) +
  facet_wrap(~ base_trait, nrow = 1, scales = "free_y") +
  scale_fill_manual(values = okabe_ito) +
  scale_color_manual(values = okabe_ito) +
  labs(
    title = "Prediction accuracy comparison across conditions",
    x = NULL,
    y = "Accuracy (trait-level mean)"
  ) +
  theme_pub +
  guides(color = "none")

## =========================
## FIG 2: 趋势图（timepoint 0..24）+ 单性状 95%CI
## 多性状如果只有一个值，ci=0，误差条不会显著
## =========================
p2 <- ggplot(trait_sum, aes(x = timepoint, y = mean_acc, color = Condition)) +
  geom_line(linewidth = 0.6) +
  geom_point(size = 1.6) +
  geom_errorbar(aes(ymin = mean_acc - ci, ymax = mean_acc + ci),
                width = 0.22, linewidth = 0.35, alpha = 0.8) +
  facet_wrap(~ base_trait, ncol = 1, scales = "free_y") +
  scale_color_manual(values = okabe_ito) +
  scale_x_continuous(breaks = sort(unique(trait_sum$timepoint))) +
  labs(
    title = "Accuracy trends across timepoints (with 95% CI for k-fold single-trait)",
    x = "Timepoint index (0 = baseline trait name without suffix)",
    y = "Accuracy (mean ± 95% CI)"
  ) +
  theme_pub

## =========================
## FIG 3: 显性效应增益（A+D − A）差值图
## 强烈建议用于结果表述：dominance improves accuracy by ...
## =========================
delta <- trait_sum %>%
  select(Approach, base_trait, timepoint, Trait, Model, mean_acc) %>%
  pivot_wider(names_from = Model, values_from = mean_acc) %>%
  mutate(delta_AD = `A+D` - `A`) %>%
  filter(!is.na(delta_AD)) %>%
  mutate(Approach = factor(Approach, levels = c("Single","Multi")))

p3 <- ggplot(delta, aes(x = Approach, y = delta_AD, fill = Approach)) +
  geom_hline(yintercept = 0, linetype = "dashed", linewidth = 0.4, color = "grey35") +
  geom_violin(width = 0.9, trim = TRUE, alpha = 0.5, color = "grey25", linewidth = 0.35) +
  geom_boxplot(width = 0.18, outlier.shape = NA, alpha = 0.75, color = "grey15", linewidth = 0.35) +
  geom_jitter(width = 0.08, height = 0, size = 1.0, alpha = 0.35) +
  facet_wrap(~ base_trait, nrow = 1, scales = "free_y") +
  scale_fill_manual(values = c("Single" = "#0072B2", "Multi" = "#009E73")) +
  labs(
    title = "Dominance gain: (A+D) − A",
    x = NULL,
    y = "Δ Accuracy"
  ) +
  theme_pub +
  guides(fill = "none")

## =========================
## 5) Print plots
## =========================
p1
p2
p3

## =========================
## 6) Save (出版级：优先 PDF；位图用 600–900 dpi)
## =========================
ggsave("Fig1_violin_box_traitmean.pdf", p1, width = 9.2, height = 3.8, device = cairo_pdf)
ggsave("Fig2_trend_timepoints.pdf",      p2, width = 8.2, height = 7.5, device = cairo_pdf)
ggsave("Fig3_dominance_gain.pdf",        p3, width = 9.2, height = 3.8, device = cairo_pdf)

ggsave("Fig1_violin_box_traitmean.png", p1, width = 9.2, height = 3.8, dpi = 800)
ggsave("Fig2_trend_timepoints.png",     p2, width = 8.2, height = 7.5, dpi = 800)
ggsave("Fig3_dominance_gain.png",       p3, width = 9.2, height = 3.8, dpi = 800)

你写论文时怎么用这些图（建议）

Fig1：一句话总结“四种条件总体分布谁更好/更稳定”
Fig2：回答“不同时间点/阶段的优势是否一致”（审稿人很爱问）
Fig3：直接量化“显性效应是否提升准确度、提升多少、单/多性状是否一致”

如果你希望我把 Fig2 做成更“Nature/Science 风格”的版本（比如每个 timepoint 用 paired slopegraph 或者把三类性状拼成一个 panel 并统一 y 轴、加 A/B/C 面板编号），我也可以在你现有数据结构上再给一版更“版面级”的排版代码（patchwork/cowplot）。

现在我有4个R数据框： S_GB：单性状、仅加性效应的预测准确度。 M_GB：多性状、仅加性效应的预...

Вопрос

Мысль

Clarifying plot organization for user’s request

Proposing two complementary plots for user

Preparing R code for M data frames visualization

Crafting functions to convert and combine data

Proposing two plots for trait comparison and accuracy

Creating plots for trait-by-time comparison and delta effect

Preparing and saving plots with distribution and time trends

Ответ

你写论文时怎么用这些图（建议）

Поделиться этим Q&A