现在我有4个R数据框: S_GB:单性状、仅加性效应的预测准确度。 M_GB:多性状、仅加性效应的预...

Créé le : 24 janvier 2026

Répondu en utilisant GPT-5.2 Thinking par Chat01

Question

现在我有4个R数据框:
S_GB:单性状、仅加性效应的预测准确度。
M_GB:多性状、仅加性效应的预测准确度。
S_GBD:单性状、加性+显性效应的预测准确度。
M_GBD:多性状、加性+显性效应的预测准确度。
其中,S_GB,S_GBD三列:k_Fold Acc Trait
M_GB ,M_GBD 一列:ACC,行名为性状:[1] "ph"    "ear"   "mu"    "ph1"   "ear1"  "mu1"   "ph2"   "ear2"  "mu2"  
[10] "ph3"   "ear3"  "mu3"   "ph4"   "ear4"  "mu4"   "ph5"   "ear5"  "mu5"  
[19] "ph6"   "ear6"  "mu6"   "ph7"   "ear7"  "mu7"   "ph8"   "ear8"  "mu8"  
[28] "ph9"   "ear9"  "mu9"   "ph10"  "ear10" "mu10"  "ph11"  "ear11" "mu11" 
[37] "ph12"  "ear12" "mu12"  "ph13"  "ear13" "mu13"  "ph14"  "ear14" "mu14" 
[46] "ph15"  "ear15" "mu15"  "ph16"  "ear16" "mu16"  "ph17"  "ear17" "mu17" 
[55] "ph18"  "ear18" "mu18"  "ph19"  "ear19" "mu19"  "ph20"  "ear20" "mu20" 
[64] "ph21"  "ear21" "mu21"  "ph22"  "ear22" "mu22"  "ph23"  "ear23" "mu23" 
[73] "ph24"  "ear24" "mu24" 
请你据此帮我通过箱线图或其他你认为更好的展示方式为我比较不同条件下的预测结果,图片质量要求美观,符合顶刊发表级别水准。使用R代码实现。

Réflexion

Clarifying plot organization for user’s request

The user is asking for R code to create publication-quality plots comparing different conditions. To handle models (S_GB, S_GBD, M_GB, M_GBD), I plan to unify data into a long format, with factors like Approach, GeneticModel, Trait, Fold, and Accuracy. For folds in M_* models, I'll set Fold to NA or 1. Multiple replicates could be involved, and I’ll ensure the R code accommodates all this complexity.

Proposing two complementary plots for user

I suggest creating two figures. First, a distribution plot with a violin, box, and jitter, faceted by TraitGroup and time points. We could use ggplot2 with a publication-ready theme and possibly ggh4x. Second, a paired comparison plot using a raincloud plot with paired differences to show improvements. For parsing trait names, I’d use stringr.

Preparing R code for M data frames visualization

For M data frames, I’ll convert the rownames (traits) into a column, using the tidyverse packages like dplyr, tidyr, and stringr. We can use ggplot2 for plotting, cowplot for combining, and maybe ggpubr. I'll set a journal-quality theme (e.g., theme_bw) with a colorblind-friendly palette and explain how to output high-res plots with ggsave at a dpi of 600.

Crafting functions to convert and combine data

I’ll write functions to convert the ‘S’ and ‘M’ data frames. The ‘S’ function will expect columns like k_Fold, Acc, and Trait, while adding columns for Approach and Genetic. The ‘M’ function will convert ACC and row names into traits, then use Approach = "Multi". I’ll parse the Trait column for base traits and time, then factor the conditions. Finally, I’ll create a faceted plot for each base trait across time points.

Proposing two plots for trait comparison and accuracy

For visualizing time (0-24), we can order traits by base and time, using a trait label combining base and time. Instead of cluttering the plot with too many boxes, I’ll facet by base_trait, with time on the x-axis and accuracy on the y-axis. For "S" data, I'll show boxplots; for "M," I'll use points. For fair comparison, I’ll propose two plots: one comparing distributions for S data (boxplots) and another comparing trait-level accuracy with points and error bars for S and M.

Creating plots for trait-by-time comparison and delta effect

I’ll create two plots: one showing a line plot for each base_trait, with time on the x-axis, accuracy on the y-axis, and different conditions marked by lines. For S data, I'll include error ribbons, and for M, points. I’ll also make a delta plot comparing S_GBD vs S_GB, showing paired differences.

Preparing and saving plots with distribution and time trends

To handle missing time data, I'm treating it as 0 and setting breaks from 0 to 24 on the x-axis. For folds in the S model, I’m considering a grey background to represent the fold range, but it feels too heavy. I'll save the figures with proper adjustments using ggplot2, embedding fonts when needed. For consistency, I'll create helper functions and clean the data, making sure the plot rules are clear and adjusting for the differing sample sizes.

Thought for 53s

Réponse

下面给你一套**“可直接跑”的 R 代码,把 4 个数据框统一整理成长表,然后输出 3 张顶刊友好**的图(你可任选其一或同时发表):

  • Fig1(推荐):每个性状在不同条件下的“性状均值准确度”的雨云/小提琴+箱线对比(避免单性状因为有 k-fold 而“样本数更大”导致偏置)
  • Fig2:按时间点(ph/ear/mu + 编号)画趋势线 + 95%CI(单性状用 k-fold 计算 CI,多性状若只有 1 个值则 CI 为 0)
  • Fig3:显性效应带来的增益(GBD − GB)的差值分布图(非常适合写结果段落)

假设:

  • S_GB/S_GBD 列名:k_Fold Acc Trait
  • M_GB/M_GBD:只有一列 ACC,行名是 Trait
  • Trait 命名如 ph, ph1...ph24

r
## ========================= ## 0) Packages ## ========================= suppressPackageStartupMessages({ library(dplyr) library(tidyr) library(stringr) library(tibble) library(ggplot2) }) ## 如果你希望 PDF 输出更“出版级”,建议装 showtext(可选) ## install.packages("showtext") ## library(showtext) ## showtext_auto() ## 色盲友好 Okabe-Ito 调色(顶刊常用) okabe_ito <- c( "Single (A)" = "#0072B2", "Single (A+D)" = "#D55E00", "Multi (A)" = "#009E73", "Multi (A+D)" = "#CC79A7" ) ## ========================= ## 1) Helper functions ## ========================= .get_acc_col <- function(df) { nms <- names(df) idx <- which(tolower(nms) %in% c("acc", "accuracy")) if (length(idx) == 0) stop("找不到 Acc/ACC/Accuracy 列,请检查列名。") nms[idx[1]] } prep_single <- function(df, approach = "Single", model = "A") { stopifnot(all(c("k_Fold", "Acc", "Trait") %in% names(df))) df %>% transmute( Trait = as.character(Trait), Fold = as.factor(k_Fold), Acc = as.numeric(Acc), Approach = approach, Model = model ) } prep_multi <- function(df, approach = "Multi", model = "A") { acc_col <- .get_acc_col(df) df %>% rownames_to_column("Trait") %>% transmute( Trait = as.character(Trait), Fold = NA, Acc = as.numeric(.data[[acc_col]]), Approach = approach, Model = model ) } parse_trait <- function(trait_chr) { base <- str_extract(trait_chr, "^[A-Za-z]+") tp <- str_extract(trait_chr, "\\d+$") tp_i <- suppressWarnings(as.integer(tp)) tp_i[is.na(tp_i)] <- 0L # ph/ear/mu 视为 timepoint=0 tibble(base_trait = base, timepoint = tp_i) } ## ========================= ## 2) Combine four dataframes into long format ## ========================= acc_long <- bind_rows( prep_single(S_GB, approach = "Single", model = "A"), prep_single(S_GBD, approach = "Single", model = "A+D"), prep_multi(M_GB, approach = "Multi", model = "A"), prep_multi(M_GBD, approach = "Multi", model = "A+D") ) %>% bind_cols(parse_trait(.$Trait)) %>% mutate( Condition = case_when( Approach == "Single" & Model == "A" ~ "Single (A)", Approach == "Single" & Model == "A+D" ~ "Single (A+D)", Approach == "Multi" & Model == "A" ~ "Multi (A)", Approach == "Multi" & Model == "A+D" ~ "Multi (A+D)" ), Condition = factor(Condition, levels = names(okabe_ito)), base_trait = factor(base_trait, levels = c("ph","ear","mu")) ) %>% filter(!is.na(Acc)) ## ========================= ## 3) Trait-level summary (关键:避免 k-fold 让单性状“权重更大”) ## ========================= trait_sum <- acc_long %>% group_by(Condition, Approach, Model, Trait, base_trait, timepoint) %>% summarise( mean_acc = mean(Acc, na.rm = TRUE), sd_acc = sd(Acc, na.rm = TRUE), n = sum(!is.na(Acc)), .groups = "drop" ) %>% mutate( se = ifelse(is.na(sd_acc) | n <= 1, 0, sd_acc / sqrt(n)), ci = ifelse(n <= 1, 0, qt(0.975, df = n - 1) * se) ) ## ========================= ## 4) Publication theme ## ========================= theme_pub <- theme_classic(base_size = 13) + theme( legend.title = element_blank(), legend.position = "top", axis.title.x = element_text(margin = margin(t = 8)), axis.title.y = element_text(margin = margin(r = 8)), strip.background = element_blank(), strip.text = element_text(face = "bold"), plot.title = element_text(face = "bold", hjust = 0), plot.margin = margin(8, 10, 8, 10) ) ## ========================= ## FIG 1: 推荐——“雨云/小提琴 + 箱线 + 抖动点” ## 每个性状贡献一个值(mean_acc),公平对比四条件 ## ========================= p1 <- ggplot(trait_sum, aes(x = Condition, y = mean_acc, fill = Condition)) + geom_violin( width = 0.95, trim = TRUE, alpha = 0.45, color = "grey25", linewidth = 0.35 ) + geom_boxplot( width = 0.18, outlier.shape = NA, alpha = 0.75, color = "grey15", linewidth = 0.35 ) + geom_jitter( aes(color = Condition), width = 0.10, height = 0, size = 1.0, alpha = 0.35 ) + facet_wrap(~ base_trait, nrow = 1, scales = "free_y") + scale_fill_manual(values = okabe_ito) + scale_color_manual(values = okabe_ito) + labs( title = "Prediction accuracy comparison across conditions", x = NULL, y = "Accuracy (trait-level mean)" ) + theme_pub + guides(color = "none") ## ========================= ## FIG 2: 趋势图(timepoint 0..24)+ 单性状 95%CI ## 多性状如果只有一个值,ci=0,误差条不会显著 ## ========================= p2 <- ggplot(trait_sum, aes(x = timepoint, y = mean_acc, color = Condition)) + geom_line(linewidth = 0.6) + geom_point(size = 1.6) + geom_errorbar(aes(ymin = mean_acc - ci, ymax = mean_acc + ci), width = 0.22, linewidth = 0.35, alpha = 0.8) + facet_wrap(~ base_trait, ncol = 1, scales = "free_y") + scale_color_manual(values = okabe_ito) + scale_x_continuous(breaks = sort(unique(trait_sum$timepoint))) + labs( title = "Accuracy trends across timepoints (with 95% CI for k-fold single-trait)", x = "Timepoint index (0 = baseline trait name without suffix)", y = "Accuracy (mean ± 95% CI)" ) + theme_pub ## ========================= ## FIG 3: 显性效应增益(A+D − A)差值图 ## 强烈建议用于结果表述:dominance improves accuracy by ... ## ========================= delta <- trait_sum %>% select(Approach, base_trait, timepoint, Trait, Model, mean_acc) %>% pivot_wider(names_from = Model, values_from = mean_acc) %>% mutate(delta_AD = `A+D` - `A`) %>% filter(!is.na(delta_AD)) %>% mutate(Approach = factor(Approach, levels = c("Single","Multi"))) p3 <- ggplot(delta, aes(x = Approach, y = delta_AD, fill = Approach)) + geom_hline(yintercept = 0, linetype = "dashed", linewidth = 0.4, color = "grey35") + geom_violin(width = 0.9, trim = TRUE, alpha = 0.5, color = "grey25", linewidth = 0.35) + geom_boxplot(width = 0.18, outlier.shape = NA, alpha = 0.75, color = "grey15", linewidth = 0.35) + geom_jitter(width = 0.08, height = 0, size = 1.0, alpha = 0.35) + facet_wrap(~ base_trait, nrow = 1, scales = "free_y") + scale_fill_manual(values = c("Single" = "#0072B2", "Multi" = "#009E73")) + labs( title = "Dominance gain: (A+D) − A", x = NULL, y = "Δ Accuracy" ) + theme_pub + guides(fill = "none") ## ========================= ## 5) Print plots ## ========================= p1 p2 p3 ## ========================= ## 6) Save (出版级:优先 PDF;位图用 600–900 dpi) ## ========================= ggsave("Fig1_violin_box_traitmean.pdf", p1, width = 9.2, height = 3.8, device = cairo_pdf) ggsave("Fig2_trend_timepoints.pdf", p2, width = 8.2, height = 7.5, device = cairo_pdf) ggsave("Fig3_dominance_gain.pdf", p3, width = 9.2, height = 3.8, device = cairo_pdf) ggsave("Fig1_violin_box_traitmean.png", p1, width = 9.2, height = 3.8, dpi = 800) ggsave("Fig2_trend_timepoints.png", p2, width = 8.2, height = 7.5, dpi = 800) ggsave("Fig3_dominance_gain.png", p3, width = 9.2, height = 3.8, dpi = 800)

你写论文时怎么用这些图(建议)

  • Fig1:一句话总结“四种条件总体分布谁更好/更稳定”
  • Fig2:回答“不同时间点/阶段的优势是否一致”(审稿人很爱问)
  • Fig3:直接量化“显性效应是否提升准确度、提升多少、单/多性状是否一致”

如果你希望我把 Fig2 做成更“Nature/Science 风格”的版本(比如每个 timepoint 用 paired slopegraph 或者把三类性状拼成一个 panel 并统一 y 轴、加 A/B/C 面板编号),我也可以在你现有数据结构上再给一版更“版面级”的排版代码(patchwork/cowplot)。

Partager cette Q&R