Node clustering analysis on graph data using node2vec and k-means

created at 01-31-2022 views: 26

1. k-means

The simplest idea is to perform unsupervised classification of nodes, specifically clustering, clustering similar nodes together according to their vector representation, here is the code directly:

def k_means(m, K):
    """
    :param m: training result of node2vec
    :param K: number of categories
    :return: clustering result
    """
    nodes = list(G.nodes)
    # Arbitrarily select K nodes as the initial cluster center
    centers = []
    temp = []
    for i in range(K):
        t = np.random.randint(0, len(nodes) - 1)
        if nodes[t] not in centers:
            temp.append(nodes[t])
            centers.append(m[nodes[t]]) # The center is a 128-dimensional vector

    # Iterate 50 times
    res = {}
    for i in range(K):
        res[i] = []

    for time in range(50):
        # clear
        for i in range(K):
            res[i].clear()
        # Calculate the distance from the vector of each point to the cluster center
        nodes_distance = {}
        for node in nodes:
            # distance from node to central node
            node_distance = []
            for center in centers:
                node_distance.append(get_dis(m[node], center))
            nodes_distance[node] = node_distance # Save the distance from node to each center
        # Re-classify each node, select a nearest node to classify, the class is 0-5
        for node in nodes:
            temp = nodes_distance[node] # stores 6 distances
            cls = temp.index(min(temp))
            res[cls].append(node)

        # update cluster centers
        centers.clear()
        for i in range(K):
            center = []
            for j in range(128):
                t = [m[node][j] for node in res[i]] # the jth coordinate of all nodes in the ith category
                center.append(np.mean(t))
            centers.append(center)

    return res

2.karate_club_graph

Karate Club Network: 34 nodes, each representing a member, with edges recording connections between paired members interacting outside the club. During the research period, a conflict arose between administrator John A and coach Mr. Hi (pseudonym), which resulted in the club being split in two. Half of the members formed a new club around Mr. Hi; another part of the members found a new coach or gave up karate.

Simply put, each node has a label (there are two labels in total). Let's output:

G = nx.karate_club_graph()
nodes = list(G.nodes.data())
print(nodes)

The output is as follows:

[(0, {'club': 'Mr. Hi'}), (1, {'club': 'Mr. Hi'}), (2, {'club': 'Mr. Hi'}), (3, {'club': 'Mr. Hi'}), (4, {'club': 'Mr. Hi'}), (5, {'club': 'Mr. Hi'}), (6, {'club': 'Mr. Hi'}), (7, {'club': 'Mr. Hi'}), (8, {'club': 'Mr. Hi'}), (9, {'club': 'Officer'}), (10, {'club': 'Mr. Hi'}), (11, {'club': 'Mr. Hi'}), (12, {'club': 'Mr. Hi'}), (13, {'club': 'Mr. Hi'}), (14, {'club': 'Officer'}), (15, {'club': 'Officer'}), (16, {'club': 'Mr. Hi'}), (17, {'club': 'Mr. Hi'}), (18, {'club': 'Officer'}), (19, {'club': 'Mr. Hi'}), (20, {'club': 'Officer'}), (21, {'club': 'Mr. Hi'}), (22, {'club': 'Officer'}), (23, {'club': 'Officer'}), (24, {'club': 'Officer'}), (25, {'club': 'Officer'}), (26, {'club': 'Officer'}), (27, {'club': 'Officer'}), (28, {'club': 'Officer'}), (29, {'club': 'Officer'}), (30, {'club': 'Officer'}), (31, {'club': 'Officer'}), (32, {'club': 'Officer'}), (33, {'club': 'Officer'})]

As you can see, all members are divided into two categories: Mr. Hi and Officer.

We draw the original graph (different classes are marked with different colors):

color_map = []
ns = list(G.nodes.data())
nodes = list(g.nodes)
for node in nodes:
    if ns[node][1]['club'] == 'Mr. Hi':
        color_map.append('#DCBB8A')
    else:
        color_map.append('#98BBEF')
nx.draw(G, node_color=color_map, pos=pos, with_labels=True, node_size=1000)
plt.show()

as shown bellow:

result

After using node2vec to get the vector representation of all nodes, the effect of clustering is as follows:

It can be found that the effect of clustering for node classification is very poor (node2vec is the picture below, and the picture above is the original picture)

3.davis_southern_women_graph

The Davis Southern Women's Social Network, node2vec behaves as follows:

result

4.Summary

Using clustering to directly classify nodes has a general effect!

Attached: The complete code for drawing:

def plot(g, m):
     """
     :param g: figure
     :param m: the vector representation of the node
     :return: none
     """
     # draw according to the original label
     pos = nx.spring_layout(G)

     color_map = []
     ns = list(G.nodes.data())
     nodes = list(g.nodes)
     for node in nodes:
         if ns[node][1]['club'] == 'Mr. Hi':
             color_map.append('#DCBB8A')
         else:
             color_map.append('#98BBEF')

     plt.subplot(2, 1, 1)
     nx.draw(G, node_color=color_map, pos=pos, with_labels=True, node_size=1000)

     K = 2
     res = k_means(m, K)

     colors = ['#DCBB8A', '#98BBEF', 'navy', 'indigo', 'orange', 'blue']
     color_map.clear()
     for node in nodes:
         for i in range(len(res)):
             if node in res[i]:
                 color_map.append(colors[i])
                 break
     # draw
     plt.subplot(2, 1, 2)
     nx.draw(G, node_color=color_map, pos=pos, with_labels=True, node_size=1000)
     plt.show()
created at:01-31-2022
edited at: 01-31-2022: