Python: deal Les Misérables network and draws graphs with networkx

created at 01-31-2022 views: 23

Dataset introduction

The character relationship graph in "Les Miserables" has a total of 77 nodes and 254 edges.

Screenshot of the dataset:

data screenshot

open README.MD

Les Misérables network, part of the Koblenz Network Collection
===========================================================================

This directory contains the TSV and related files of the moreno_lesmis network: This undirected network contains co-occurances of characters in Victor Hugo's novel 'Les Misérables'. A node represents a character and an edge between two nodes shows that these two characters appeared in the same chapter of the the book. The weight of each link indicates how often such a co-appearance occured.


More information about the network is provided here: 
http://konect.cc/networks/moreno_lesmis

Files: 
    meta.moreno_lesmis -- Metadata about the network 
    out.moreno_lesmis -- The adjacency matrix of the network in whitespace-separated values format, with one edge per line
      The meaning of the columns in out.moreno_lesmis are: 
        First column: ID of from node 
        Second column: ID of to node
        Third column (if present): weight or multiplicity of edge
        Fourth column (if present):  timestamp of edges Unix time
        Third column: edge weight


Use the following References for citation:

@MISC{konect:2017:moreno_lesmis,
    title = {Les Misérables network dataset -- {KONECT}},
    month = oct,
    year = {2017},
    url = {http://konect.cc/networks/moreno_lesmis}
}

@book{konect:knuth1993,
    title = {The {Stanford} {GraphBase}: A Platform for Combinatorial Computing},
    author = {Knuth, Donald Ervin},
    volume = {37},
    year = {1993},
    publisher = {Addison-Wesley Reading},
}

@book{konect:knuth1993,
    title = {The {Stanford} {GraphBase}: A Platform for Combinatorial Computing},
    author = {Knuth, Donald Ervin},
    volume = {37},
    year = {1993},
    publisher = {Addison-Wesley Reading},
}


@inproceedings{konect,
    title = {{KONECT} -- {The} {Koblenz} {Network} {Collection}},
    author = {Jérôme Kunegis},
    year = {2013},
    booktitle = {Proc. Int. Conf. on World Wide Web Companion},
    pages = {1343--1350},
    url = {http://dl.acm.org/citation.cfm?id=2488173},
    url_presentation = {https://www.slideshare.net/kunegis/presentationwow},
    url_web = {http://konect.cc/},
    url_citations = {https://scholar.google.com/scholar?cites=7174338004474749050},
}

@inproceedings{konect,
    title = {{KONECT} -- {The} {Koblenz} {Network} {Collection}},
    author = {Jérôme Kunegis},
    year = {2013},
    booktitle = {Proc. Int. Conf. on World Wide Web Companion},
    pages = {1343--1350},
    url = {http://dl.acm.org/citation.cfm?id=2488173},
    url_presentation = {https://www.slideshare.net/kunegis/presentationwow},
    url_web = {http://konect.cc/},
    url_citations = {https://scholar.google.com/scholar?cites=7174338004474749050},
}

It can be seen from this: the graph is an undirected graph, the nodes represent the characters in "Les Miserables", the edge between the two nodes represents that the two characters appear in the same chapter of the book, and the weight of the edge represents the two characters How often (nodes) appear in the same chapter.

The real data is in out.moreno_lesmis_lesmis, open and save as a csv file:

STORE AS CSV DATA

data processing

The initialization code for the undirected graph in networkx is:

g = nx.Graph()
g.add_nodes_from([i for i in range(1, 78)])
g.add_edges_from([(1, 2, {'weight': 1})])

The initialization of nodes is easy to solve, we mainly solve the initialization of edges: first convert dataframe into a list, and then convert each element into a tuple.

df = pd.read_csv('out.csv')
res = df.values.tolist()
for i in range(len(res)):
    res[i][2] = dict({'weight': res[i][2]})
res = [tuple(x) for x in res]
print(res)

The res output is as follows (partially):

[(1, 2, {'weight': 1}), (2, 3, {'weight': 8}), (2, 4, {'weight': 10}), (2, 5, {'weight': 1}), (2, 6, {'weight': 1}), (2, 7, {'weight': 1}), (2, 8, {'weight': 1})...]

So the initialization code for the graph is:

g = nx.Graph()
g.add_nodes_from([i for i in range(1, 78)])
g.add_edges_from(res)

draw

nx.draw(g)
plt.show()

plot

Datasets that come with networkx After being busy for a long time, I found that networkx has its own data set, including the character relationship diagram of Les Miserables:

g = nx.les_miserables_graph()
nx.draw(g, with_labels=True)
plt.show()

result

complete code

# -*- coding: utf-8 -*-
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd

# 77 254
df = pd.read_csv('out.csv')
res = df.values.tolist()

for i in range(len(res)):
    res[i][2] = dict({'weight': res[i][2]})

res = [tuple(x) for x in res]
print(res)

# intialize plot
g = nx.Graph()
g.add_nodes_from([i for i in range(1, 78)])
g.add_edges_from(res)

g = nx.les_miserables_graph()
nx.draw(g, with_labels=True)
plt.show()
created at:01-31-2022
edited at: 01-31-2022: