The igraph package visualizes gene regulatory networks

created at 07-11-2021 views: 2

introduction

The overall idea can be divided into two major steps:

  1. According to the sequencing data, mine a group of genes that are highly related to a specific biological factor, and the corresponding expression information;
  2. Then calculate the expression correlation between these genes and use the igraph package for visualization.

Step 1 Get the gene info

Sample data: Sequencing data provided by the estrogen package. According to whether estrogen is added or not, and the duration of action, they are divided into four groups, and each group is repeated twice.

 estrogen time.h
low10-1.celabsent10
low10-2.celabsent10
high10-1.celpresent10
high10-2.celpresent10
low48-1.celabsent48
low48-2.celabsent48
high48-1.celpresent48
high48-2.celpresent48

Genes of interest: Top genes whose expression levels are significantly related to whether estrogen is added, and the duration of action.

1.1 Get the expression matrix

expression matrix

# Identify genes of significant effect
lm.coef <- function(y) lm(y ~ estrogen * time.h)$coefficients
eff <- esApply(x, 1, lm.coef)
effectUp <- names(sort(eff[2,], decreasing=TRUE)[1:25])
effectDown <- names(sort(eff[2,], decreasing=FALSE)[1:25])
main.effects <- c(effectUp, effectDown)
str(main.effects)
#chr [1:50] "36785_at" "39755_at" "151_s_at" "32174_at" "39781_at" "1985_s_at" "1840_g_at" ...

# Filter the expression set object to include only genes of significant effect
estrogenMainEffects <- exprs(x)[main.effects,]
dim(estrogenMainEffects)
#[1] 50  8

Step 2 visualize these gene regulatory network

2.1 Calculate the relationship/connection between two genes

#Based on relevance
gene_cor=as.matrix(as.dist(cor(t(estrogenMainEffects), method="pearson")))
dim(gene_cor)
#50 50

#Based on Euclidean distance
#gene_dist=as.matrix(dist(estrogenMainEffects, method="euclidean"))

2.2 Create igraph object

library(igraph)
# Create a graph adjacency based on correlation distances between genes in  pairwise fashion.
g <- graph.adjacency(
  gene_cor,
  mode="undirected",
  weighted=TRUE,
  diag=FALSE
)

# Simplfy the adjacency object
g <- simplify(g, remove.multiple=TRUE, remove.loops=TRUE)

igraph object

2.3 Modify igraph object

Set the color of the edge according to the positive and negative of the correlation between genes

# Colour negative correlation edges as blue
E(g)[which(E(g)$weight<0)]$color <- "darkblue"

# Colour positive correlation edges as red
E(g)[which(E(g)$weight>0)]$color <- "darkred"

Eliminate edges with correlation less than 0.8

# Convert edge weights to absolute values
E(g)$weight <- abs(E(g)$weight)

# Remove edges below absolute Pearson correlation 0.8
g <- delete_edges(g, E(g)[which(E(g)$weight<0.8)])

# Remove any vertices remaining that have no edges
g <- delete.vertices(g, degree(g)==0)

# Assign names to the graph vertices (optional)
V(g)$name <- V(g)$name

Set the shape and color of the node

# Change shape of graph vertices
V(g)$shape <- "sphere"

# Change colour of graph vertices
V(g)$color <- "skyblue"

# Change colour of vertex frames
V(g)$vertex.frame.color <- "white"

Set the size of nodes and edges respectively. The former is related to the amount of gene expression, the latter depends on the correlation

# Scale the size of the vertices to be proportional to the level of expression of each gene represented by each vertex
# Multiply scaled vales by a factor of 10
scale01 <- function(x){(x-min(x))/(max(x)-min(x))}
vSizes <- (scale01(apply(estrogenMainEffects, 1, mean)) + 1.0) * 5

# Amplify or decrease the width of the edges
edgeweights <- E(g)$weight * 2.0

Calculate the minimum spanning tree (MST);

# Convert the graph adjacency object into a minimum spanning tree based on Prim's algorithm
mst <- mst(g, algorithm="prim")

2.4 Visualization

Direct visualization

set.seed(1)
plot(
  mst,
  layout=layout.graphopt,
  edge.curved=TRUE,
  vertex.size=vSizes,
  vertex.label.dist=-0.5,
  vertex.label.color="black",
  asp=FALSE,
  vertex.label.cex=0.6,
  edge.width=edgeweights,
  edge.arrow.mode=0,
  main="My first graph")

Other optional layout methods are:

  • layout.fruchterman.reingold
  • layout.kamada.kawai
  • layout.lgl
  • layout.circle
  • layout.graphopt

first graph

Set module label

mst.communities <- edge.betweenness.community(mst, weights=NULL, directed=FALSE)
mst.clustering <- make_clusters(mst, membership=mst.communities$membership)
V(mst)$color <- mst.communities$membership + 1

set.seed(1)
plot(
  mst,
  layout=layout.fruchterman.reingold,
  edge.curved=TRUE,
  vertex.size=vSizes,
  vertex.label.dist=-0.5,
  vertex.label.color="black",
  asp=FALSE,
  vertex.label.cex=0.6,
  edge.width=edgeweights,
  edge.arrow.mode=0,
  main="My second graph")

second graph

set.seed(1)
plot(
  mst.clustering, mst,
  layout=layout.fruchterman.reingold,
  edge.curved=TRUE,
  vertex.size=vSizes,
  vertex.label.dist=-0.5,
  vertex.label.color="black",
  asp=FALSE,
  vertex.label.cex=0.6,
  edge.width=edgeweights,
  edge.arrow.mode=0,
  main="My second graph")

third graph

Use the edge.betweenness.community of the igraph package for module division or clustering as above. The igraph package also provides a variety of other partitioning algorithms for use, see here for details

In addition, for case-control study, a useful approach is to generate separate networks for cases and controls and then compare the genes in these based on these metrics. Mention it briefly, so I won't repeat it.

Reference

https://www.biostars.org/p/285296/

Please log in to leave a comment.