The overall idea can be divided into two major steps:
Sample data: Sequencing data provided by the estrogen package. According to whether estrogen is added or not, and the duration of action, they are divided into four groups, and each group is repeated twice.
estrogen | time.h | |
low10-1.cel | absent | 10 |
low10-2.cel | absent | 10 |
high10-1.cel | present | 10 |
high10-2.cel | present | 10 |
low48-1.cel | absent | 48 |
low48-2.cel | absent | 48 |
high48-1.cel | present | 48 |
high48-2.cel | present | 48 |
Genes of interest: Top genes whose expression levels are significantly related to whether estrogen is added, and the duration of action.
# Identify genes of significant effect
lm.coef <- function(y) lm(y ~ estrogen * time.h)$coefficients
eff <- esApply(x, 1, lm.coef)
effectUp <- names(sort(eff[2,], decreasing=TRUE)[1:25])
effectDown <- names(sort(eff[2,], decreasing=FALSE)[1:25])
main.effects <- c(effectUp, effectDown)
str(main.effects)
#chr [1:50] "36785_at" "39755_at" "151_s_at" "32174_at" "39781_at" "1985_s_at" "1840_g_at" ...
# Filter the expression set object to include only genes of significant effect
estrogenMainEffects <- exprs(x)[main.effects,]
dim(estrogenMainEffects)
#[1] 50 8
#Based on relevance
gene_cor=as.matrix(as.dist(cor(t(estrogenMainEffects), method="pearson")))
dim(gene_cor)
#50 50
#Based on Euclidean distance
#gene_dist=as.matrix(dist(estrogenMainEffects, method="euclidean"))
library(igraph)
# Create a graph adjacency based on correlation distances between genes in pairwise fashion.
g <- graph.adjacency(
gene_cor,
mode="undirected",
weighted=TRUE,
diag=FALSE
)
# Simplfy the adjacency object
g <- simplify(g, remove.multiple=TRUE, remove.loops=TRUE)
Set the color of the edge according to the positive and negative of the correlation between genes
# Colour negative correlation edges as blue
E(g)[which(E(g)$weight<0)]$color <- "darkblue"
# Colour positive correlation edges as red
E(g)[which(E(g)$weight>0)]$color <- "darkred"
Eliminate edges with correlation less than 0.8
# Convert edge weights to absolute values
E(g)$weight <- abs(E(g)$weight)
# Remove edges below absolute Pearson correlation 0.8
g <- delete_edges(g, E(g)[which(E(g)$weight<0.8)])
# Remove any vertices remaining that have no edges
g <- delete.vertices(g, degree(g)==0)
# Assign names to the graph vertices (optional)
V(g)$name <- V(g)$name
Set the shape and color of the node
# Change shape of graph vertices
V(g)$shape <- "sphere"
# Change colour of graph vertices
V(g)$color <- "skyblue"
# Change colour of vertex frames
V(g)$vertex.frame.color <- "white"
Set the size of nodes and edges respectively. The former is related to the amount of gene expression, the latter depends on the correlation
# Scale the size of the vertices to be proportional to the level of expression of each gene represented by each vertex
# Multiply scaled vales by a factor of 10
scale01 <- function(x){(x-min(x))/(max(x)-min(x))}
vSizes <- (scale01(apply(estrogenMainEffects, 1, mean)) + 1.0) * 5
# Amplify or decrease the width of the edges
edgeweights <- E(g)$weight * 2.0
Calculate the minimum spanning tree (MST);
# Convert the graph adjacency object into a minimum spanning tree based on Prim's algorithm
mst <- mst(g, algorithm="prim")
Direct visualization
set.seed(1)
plot(
mst,
layout=layout.graphopt,
edge.curved=TRUE,
vertex.size=vSizes,
vertex.label.dist=-0.5,
vertex.label.color="black",
asp=FALSE,
vertex.label.cex=0.6,
edge.width=edgeweights,
edge.arrow.mode=0,
main="My first graph")
Other optional layout methods are:
layout.fruchterman.reingold
layout.kamada.kawai
layout.lgl
layout.circle
layout.graphopt
Set module label
mst.communities <- edge.betweenness.community(mst, weights=NULL, directed=FALSE)
mst.clustering <- make_clusters(mst, membership=mst.communities$membership)
V(mst)$color <- mst.communities$membership + 1
set.seed(1)
plot(
mst,
layout=layout.fruchterman.reingold,
edge.curved=TRUE,
vertex.size=vSizes,
vertex.label.dist=-0.5,
vertex.label.color="black",
asp=FALSE,
vertex.label.cex=0.6,
edge.width=edgeweights,
edge.arrow.mode=0,
main="My second graph")
set.seed(1)
plot(
mst.clustering, mst,
layout=layout.fruchterman.reingold,
edge.curved=TRUE,
vertex.size=vSizes,
vertex.label.dist=-0.5,
vertex.label.color="black",
asp=FALSE,
vertex.label.cex=0.6,
edge.width=edgeweights,
edge.arrow.mode=0,
main="My second graph")
Use the edge.betweenness.community
of the igraph package for module division or clustering as above. The igraph package also provides a variety of other partitioning algorithms for use, see here for details
In addition, for case-control study, a useful approach is to generate separate networks for cases and controls and then compare the genes in these based on these metrics. Mention it briefly, so I won't repeat it.