How to plot UpSet diagram in R

created at 07-01-2021 views: 2

Today, I will introduce to you the use of the R package UpSetR and the method of drawing UpSet diagrams. The visualization effect of UpSet diagrams is better than that of Venn diagrams and petal diagrams. The intersection of multiple sets can be viewed clearly, and the number of groups is better than that of Venn diagrams. More visual information than petal map.

data structure

tax.txt

tax.txt

use R package UpSetR plot UpSet chart

1. Download and install the UpSetR package, read in the drawing file, and convert the data format into a 0-1 data matrix;

##Wein Diagram (UpSetR package, not limited by the number of samples)
install.packages(UpSetR)
library(UpSetR)

otu1[otu1 > 0] <- 1
map<-read.table('tax.txt',header=T,sep="\t",row.names=1)
#Combine data
merged=merge(otu1,map,by="row.names",all.x=TRUE)

2. Use the upset() function to draw a simple UpSet diagram;

# plot
upset(merged)

plot

3. If you want to show more subsets, you can set nintersects larger;

#By default, at most 40 kinds of intersections of 5 sets of data (nset = 5) are displayed (nintersects = 40)
upset(merged, sets = c('c1','c2','c3','c4','c5','c6'), nintersects = 100)

set nintersects

4. Add sorting:

upset(merged, nset = 6, nintersects = 100, order.by = c('freq', 'degree'), decreasing = c(TRUE, TRUE))

set nintersects

5. Use the queries parameter to mark the specific intersection you care about, or the distribution of a specific element (the blue element in the figure below);

upset(merged, nset = 6, nintersects = 100, order.by = c('freq', 'degree'), decreasing = c(TRUE, TRUE), 
      queries = list(list(query = intersects, params = 'c1', color = '#00A087B2'),
                     list(query = intersects, params = c('c1', 'c2', 'c3', 'c4', 'c5', 'c6'), color = 'darksalmon'),
                     list(query = elements, params = c('taxonomy', 'Proteobacteria'), color = 'cyan1', active = TRUE)))

mark the specific intersection

6. If you want to view the species classification of otu and the original abundance of the taxa that it belongs to at the same time, you can draw a pie chart and a histogram first, and then combine them with the Venn diagram through UpSetR;

##Complex style example
#We select the common OTUs of 6 sets of data, and calculate the relative abundance according to the classification of these OTUs, draw pie charts and histogram
# Then combine them with the Venn diagram through UpSetR
library(reshape2)
library(doBy)
library(ggplot2)

#Get 6 groups of OTUs
select_otu <- rownames(otu1[rowSums(merged[1:6]) == 6, ])
otu_select <- otu[select_otu, ]

#According to taxonomy, calculate the total abundance of these common OTUs
phylum <- summaryBy(c1+c2+c3+c4+c5+c6~taxonomy, otu_select, FUN = sum)
names(phylum) <- c('taxonomy', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6')

phylum_melt <- melt(phylum, id = 'taxonomy')
phylum_melt$value <- phylum_melt$value * 100
phylum_melt_stat <- summaryBy(value~taxonomy, phylum_melt, FUN = mean)

# Sort by taxonomy abundance in descending order, easy to map
phylum_melt_stat <- phylum_melt_stat[order(phylum_melt_stat$value.mean), ]
phylum_melt_stat$taxonomy <- factor(phylum_melt_stat$taxonomy, levels = c('Others', as.vector(phylum_melt_stat$taxonomy[-which(phylum_melt_stat$taxonomy == 'Others')])))

#taxonomy Abundance Pie Chart

color=c('#8DD3C7', '#FFFFB3', '#BEBADA', '#FB8072', '#80B1D3', '#FDB462', '#B3DE69', '#FCCDE5', '#BC80BD', "lightskyblue", '#CCEBC5')

plot1 <- function(mydata, x, y) {
  ggplot(phylum_melt_stat, aes(x = '', y = value.mean, fill = taxonomy)) +
    geom_bar(stat = 'identity', show.legend = FALSE) +
    coord_polar(theta = 'y') +
    scale_fill_manual(values = color) +
    theme(panel.grid = element_blank(), panel.background = element_blank(), axis.text.x = element_blank(), plot.background = element_blank()) +
    labs(x = NULL, y = 'Number of all shared OTUs: 1628\nAverage abundance of main phylum')
}
#taxonomy abundance histogram
phylum_melt$taxonomy <- factor(phylum_melt$taxonomy, levels = levels(phylum_melt_stat$taxonomy))

plot2 <- function(mydata, x, y) {
  ggplot(phylum_melt, aes(x = variable, y = value, fill = taxonomy)) +
    geom_col(position = 'stack', width = 0.6) +
    scale_fill_manual(values = color) +
    theme(panel.grid = element_blank(), panel.background = element_rect(color = 'black', fill = 'transparent')) +
    labs(x = '', y = 'Relative abundance (%)', fill = NULL)
}

#Combination style, specify additional graphics through the attribute.plots parameter
upset(merged, nset = 6, nintersects = 100, order.by = c('freq', 'degree'), decreasing = c(TRUE, TRUE), 
      queries = list(list(query = intersects, params = c('c1', 'c2', 'c3', 'c4', 'c5', 'c6'), color = 'gray', active = TRUE)),
      attribute.plots = list(gridrows = 80, ncols = 2,
                             plots = list(list(plot = plot1, mydata = NA, x = NA, y = NA, queries = FALSE),
                                          list(plot = plot2, mydata = NA, x = NA, y = NA, queries = FALSE))))

Note:
The additional histogram and pie chart added here use the original OTU abundance data, which is independent of the 0-1 type of OTU data itself in the UpSet chart
When building an external ggplot2 command, it is directly packaged inside the plot1 and plot2 functions
Nevertheless, plot1 and plot2 still need to add the three parameters "mydata, x, y", which cannot be left blank, otherwise it will not be recognized by upset()
But it can be solved by passing the parameter as a null value

final result

Please log in to leave a comment.