The enrichment analysis scatter plot is a graphical display of enrichment analysis results. In the figure, the degree of enrichment is measured by GeneRatio, padj and the number of genes enriched in this pathway. Among them, GeneRatio refers to the ratio of the number of genes located in the pathway entry in the target gene to the total number of genes located in the pathway entry in all annotated genes. padj is the pvalue after multiple hypothesis testing and correction. The closer it is to 0, the more significant the enrichment.
library(argparser)
library(ggplot2)
# Incoming parameters
argv <- arg_parser('Draw a scatter plot based on the results of the enrichment analysis')
argv <- add_argument(argv, "--enrich", help = "Enrichment result file")
argv <- add_argument(argv, "--ndot", help = "The figure shows the number of pathways", type = 'numeric')
argv <- add_argument(argv, "--title", help = "Graphic title")
argv <- add_argument(argv, "--prefix", help = "Output picture prefix")
argv <- parse_args(argv)
enrich <- argv$enrich
ndot <- argv$ndot
title <- argv$title
prefix <- argv$prefix
enrich <- read.delim(enrich, header = T, sep = '\t')
enrich <- na.omit(enrich)
# Sort by padj and select ndot pathway drawings
enrich <- enrich[order(enrich$padj, decreasing = F), ]
if(nrow(enrich)>ndot){enrich <- enrich[1:ndot, ]}
#Calculate GeneRatio
ratio <- matrix(as.numeric(unlist(strsplit(as.vector(enrich$GeneRatio),"/"))), ncol = 2, byrow = TRUE)
enrich$GeneRatio <- ratio[,1]/ratio[,2]
#Drawing enrichment scatter plot
p <- ggplot(enrich, aes(x = GeneRatio, y = Description, colour = padj, size = Count))
p <- p + geom_point()
p <- p + scale_colour_gradientn(colours = rainbow(4), guide = "colourbar") + expand_limits(color = seq(0,1,by = 0.25))
p <- p + ggtitle(title) + xlab("GeneRatio") +ylab("")
p <- p + theme_bw() + theme(axis.text = element_text(color = "black", size = 10))
p <- p + theme(panel.border = element_rect(colour = "black"))
p <- p + theme(plot.title = element_text(vjust = 1), legend.key = element_blank())
ggsave(paste(prefix, '.dot.png', sep = ''), plot = p, width = 8, height = 6, type = 'cairo-png')
The graph is shown as follows.
pathway
name, and the horizontal axis represents the GeneRatio
corresponding to the pathway. padj
is represented by the color of dots. The smaller the padj, the closer the color is to red.