How to plot enrichment analysis scatter diagram with ggplot2

created at 06-30-2021 views: 2

1. Scatter plot of enrichment analysis

The enrichment analysis scatter plot is a graphical display of enrichment analysis results. In the figure, the degree of enrichment is measured by GeneRatio, padj and the number of genes enriched in this pathway. Among them, GeneRatio refers to the ratio of the number of genes located in the pathway entry in the target gene to the total number of genes located in the pathway entry in all annotated genes. padj is the pvalue after multiple hypothesis testing and correction. The closer it is to 0, the more significant the enrichment.

2 code of scatter plot

library(argparser)
library(ggplot2)

# Incoming parameters
argv <- arg_parser('Draw a scatter plot based on the results of the enrichment analysis')
argv <- add_argument(argv, "--enrich", help = "Enrichment result file")
argv <- add_argument(argv, "--ndot", help = "The figure shows the number of pathways", type = 'numeric')
argv <- add_argument(argv, "--title", help = "Graphic title")
argv <- add_argument(argv, "--prefix", help = "Output picture prefix")
argv <- parse_args(argv)
enrich <- argv$enrich
ndot <- argv$ndot
title <- argv$title
prefix <- argv$prefix

enrich <- read.delim(enrich, header = T, sep = '\t')
enrich <- na.omit(enrich)

# Sort by padj and select ndot pathway drawings
enrich <- enrich[order(enrich$padj, decreasing = F), ]
if(nrow(enrich)>ndot){enrich <- enrich[1:ndot, ]}

#Calculate GeneRatio
ratio <- matrix(as.numeric(unlist(strsplit(as.vector(enrich$GeneRatio),"/"))), ncol = 2, byrow = TRUE)
enrich$GeneRatio <- ratio[,1]/ratio[,2]

#Drawing enrichment scatter plot
p <- ggplot(enrich, aes(x = GeneRatio, y = Description, colour = padj, size = Count))
p <- p + geom_point()
p <- p + scale_colour_gradientn(colours = rainbow(4), guide = "colourbar") + expand_limits(color = seq(0,1,by = 0.25))
p <- p + ggtitle(title) + xlab("GeneRatio") +ylab("")
p <- p + theme_bw() + theme(axis.text = element_text(color = "black", size = 10))
p <- p + theme(panel.border = element_rect(colour = "black"))
p <- p + theme(plot.title = element_text(vjust = 1), legend.key = element_blank())
ggsave(paste(prefix, '.dot.png', sep = ''), plot = p, width = 8, height = 6, type = 'cairo-png')

3. Graphic display

The graph is shown as follows. 

Scatter plot of enrichment analysis

  • The vertical axis represents the pathway name, and the horizontal axis represents the GeneRatio corresponding to the pathway. 
  • The size of padj is represented by the color of dots. The smaller the padj, the closer the color is to red. 
  • The number of differential genes contained in each pathway is represented by dots. The size is expressed.
Please log in to leave a comment.