# Node clustering analysis on graph data using node2vec and k-means

created at 01-31-2022 views: 26

## 1. k-means¶

The simplest idea is to perform unsupervised classification of nodes, specifically clustering, clustering similar nodes together according to their vector representation, here is the code directly:

def k_means(m, K):
"""
:param m: training result of node2vec
:param K: number of categories
:return: clustering result
"""
nodes = list(G.nodes)
# Arbitrarily select K nodes as the initial cluster center
centers = []
temp = []
for i in range(K):
t = np.random.randint(0, len(nodes) - 1)
if nodes[t] not in centers:
temp.append(nodes[t])
centers.append(m[nodes[t]]) # The center is a 128-dimensional vector

# Iterate 50 times
res = {}
for i in range(K):
res[i] = []

for time in range(50):
# clear
for i in range(K):
res[i].clear()
# Calculate the distance from the vector of each point to the cluster center
nodes_distance = {}
for node in nodes:
# distance from node to central node
node_distance = []
for center in centers:
node_distance.append(get_dis(m[node], center))
nodes_distance[node] = node_distance # Save the distance from node to each center
# Re-classify each node, select a nearest node to classify, the class is 0-5
for node in nodes:
temp = nodes_distance[node] # stores 6 distances
cls = temp.index(min(temp))
res[cls].append(node)

# update cluster centers
centers.clear()
for i in range(K):
center = []
for j in range(128):
t = [m[node][j] for node in res[i]] # the jth coordinate of all nodes in the ith category
center.append(np.mean(t))
centers.append(center)

return res


## 2.karate_club_graph¶

Karate Club Network: 34 nodes, each representing a member, with edges recording connections between paired members interacting outside the club. During the research period, a conflict arose between administrator John A and coach Mr. Hi (pseudonym), which resulted in the club being split in two. Half of the members formed a new club around Mr. Hi; another part of the members found a new coach or gave up karate.

Simply put, each node has a label (there are two labels in total). Let's output:

G = nx.karate_club_graph()
nodes = list(G.nodes.data())
print(nodes)


The output is as follows:

[(0, {'club': 'Mr. Hi'}), (1, {'club': 'Mr. Hi'}), (2, {'club': 'Mr. Hi'}), (3, {'club': 'Mr. Hi'}), (4, {'club': 'Mr. Hi'}), (5, {'club': 'Mr. Hi'}), (6, {'club': 'Mr. Hi'}), (7, {'club': 'Mr. Hi'}), (8, {'club': 'Mr. Hi'}), (9, {'club': 'Officer'}), (10, {'club': 'Mr. Hi'}), (11, {'club': 'Mr. Hi'}), (12, {'club': 'Mr. Hi'}), (13, {'club': 'Mr. Hi'}), (14, {'club': 'Officer'}), (15, {'club': 'Officer'}), (16, {'club': 'Mr. Hi'}), (17, {'club': 'Mr. Hi'}), (18, {'club': 'Officer'}), (19, {'club': 'Mr. Hi'}), (20, {'club': 'Officer'}), (21, {'club': 'Mr. Hi'}), (22, {'club': 'Officer'}), (23, {'club': 'Officer'}), (24, {'club': 'Officer'}), (25, {'club': 'Officer'}), (26, {'club': 'Officer'}), (27, {'club': 'Officer'}), (28, {'club': 'Officer'}), (29, {'club': 'Officer'}), (30, {'club': 'Officer'}), (31, {'club': 'Officer'}), (32, {'club': 'Officer'}), (33, {'club': 'Officer'})]


As you can see, all members are divided into two categories: Mr. Hi and Officer.

We draw the original graph (different classes are marked with different colors):

color_map = []
ns = list(G.nodes.data())
nodes = list(g.nodes)
for node in nodes:
if ns[node][1]['club'] == 'Mr. Hi':
color_map.append('#DCBB8A')
else:
color_map.append('#98BBEF')
nx.draw(G, node_color=color_map, pos=pos, with_labels=True, node_size=1000)
plt.show()


as shown bellow:

After using node2vec to get the vector representation of all nodes, the effect of clustering is as follows:

It can be found that the effect of clustering for node classification is very poor (node2vec is the picture below, and the picture above is the original picture)

## 3.davis_southern_women_graph¶

The Davis Southern Women's Social Network, node2vec behaves as follows:

## 4.Summary¶

Using clustering to directly classify nodes has a general effect!

Attached: The complete code for drawing:

def plot(g, m):
"""
:param g: figure
:param m: the vector representation of the node
:return: none
"""
# draw according to the original label
pos = nx.spring_layout(G)

color_map = []
ns = list(G.nodes.data())
nodes = list(g.nodes)
for node in nodes:
if ns[node][1]['club'] == 'Mr. Hi':
color_map.append('#DCBB8A')
else:
color_map.append('#98BBEF')

plt.subplot(2, 1, 1)
nx.draw(G, node_color=color_map, pos=pos, with_labels=True, node_size=1000)

K = 2
res = k_means(m, K)

colors = ['#DCBB8A', '#98BBEF', 'navy', 'indigo', 'orange', 'blue']
color_map.clear()
for node in nodes:
for i in range(len(res)):
if node in res[i]:
color_map.append(colors[i])
break
# draw
plt.subplot(2, 1, 2)
nx.draw(G, node_color=color_map, pos=pos, with_labels=True, node_size=1000)
plt.show()

created at:01-31-2022