The second season of Arcane, a current blockbuster collection on Netflix primarily based on the universe of one of the vital standard on-line video video games ever, League of Legends, is about in a fantasy world with heavy steampunk design, closed with astonishing visuals and a record-breaking funds. As a very good community and information scientist with a selected curiosity in turning pop cultural objects into information visualization, this was all I wanted after ending the closing season to map out the hidden connections and switch the storyline of Arcane right into a community visualization — utilizing Python. Therefore, by the top of this tutorial, you’ll have hands-on abilities on find out how to create and visualize the community behind Arcane.
Nevertheless, these abilities and strategies are completely not particular to this story. In actual fact, they spotlight the final strategy community science gives to map out, design, visualize, and interpret networks of any complicated system. These methods can vary from transportation and COVID-19 spreading community patterns to mind networks to numerous social networks, equivalent to that of the Arcane collection.
All photos created by the writer.
Since right here we’re going to map out the connections behind all characters, first, we have to get a listing of every character. For this, the Arcane fan wiki web site is a superb supply of free-to-use data (CC BY-SA 3.0), which we will simply entry by easy net scraping methods. Specifically, we are going to use urllib to obtain, and with BeautifulSoup, we are going to extract the names and fan wiki profile URLs of every character listed on the principle character web page.
First downloading the character itemizing web site’s html:
import urllib
import bs4 as bs
from urllib.request import urlopenurl_char = 'https://arcane.fandom.com/wiki/Class:Characters'
sauce = urlopen(url_char).learn()
soup = bs.BeautifulSoup(sauce,'lxml')
Then, I extracted all the doubtless related names. One can simply determine what tags to feed the parsed html saved within the ‘soup’ variable by simply right-clicking on a desired component (on this case, a personality profile) and deciding on the component inspection choice in any browser.
From this, I realized that the title and url of a personality are saved in a line which has ‘title=’ in it, however doesn’t comprise ‘:’ (which corresponds to classes). Moreover, I created a still_character flag, which helped me resolve which subpages on the character itemizing web page nonetheless belong to respectable characters of the story.
import rechars = soup.find_all('li')
still_character = True
names_urls = {}
for char in chars:
if '" title="' in str(char) and ':' not in char.textual content and still_character:
char_name = char.textual content.strip().rstrip()
if char_name == 'Arcane':
still_character = False
char_url = 'https://arcane.fandom.com' + re.search(r'href="([^"]+)"', str(char)).group(1)
if still_character:
names_urls[char_name] = char_url
The earlier code block will create a dictionary (‘names_urls’) which shops the title and url of every character as key-value pairs. Now let’s have a fast take a look at what we’ve and print the name-url dictionary and the full size of it:
for title, url in names_urls.objects():
print(title, url)
A pattern of the output from this code block, the place we will textual content every hyperlink — pointing to the biography profile of every character:
print(len(names_urls))
Which code cell returns the results of 67, implying the full variety of named characters we’ve to take care of. This implies we’re already executed with the primary job — we’ve a complete record of characters in addition to quick access to their full textual profile on their fan wiki websites.
To map out the connections between two characters, we determine a technique to quantify the connection between every two characters. To seize this, I depend on how incessantly the 2 character’s biographies reference one another. On the technical finish, to realize this, we might want to acquire these full biographies we simply bought the hyperlinks to. We’ll get that once more utilizing easy net scraping methods, after which save the supply of every web site in a separate file domestically as follows.
# output folder for the profile htmls
import os
folderout = 'fandom_profiles'
if not os.path.exists(folderout):
os.makedirs(folderout)# crawl and save the profile htmls
for ind, (title, url) in enumerate(names_urls.objects()):
if not os.path.exists(folderout + '/' + title + '.html'):
fout = open(folderout + '/' + title + '.html', "w")
fout.write(str(urlopen(url).learn()))
fout.shut()
By the top of this part, our folder ‘fandom_profiles’ ought to comprise the fanwiki profiles of every Arcane character — able to be processed as we work our approach in the direction of constructing the Arcane community.
To construct the community between characters, we assume that the depth of interactions between two characters is signaled by the variety of occasions every character’s profile mentions the opposite. Therefore, the nodes of this community are the characters, that are linked with connections of various energy primarily based on the variety of occasions every character’s wiki web site supply references every other character’s wiki.
Constructing the community
Within the following code block, we construct up the sting record — the record of connections that comprises each the supply and the goal node (character) of every connection, in addition to the burden (co-reference frequency) between the 2 characters. Moreover, to conduct the in-profile search successfully, I create a names_ids which solely comprises the particular identifier of every character, with out the remainder of the online deal with.
# extract the title mentions from the html sources
# and construct the record of edges in a dictionary
edges = {}
names_ids = {n : u.cut up('/')[-1] for n, u in names_urls.objects()}for fn in [fn for fn in os.listdir(folderout) if '.html' in fn]:
title = fn.cut up('.html')[0]
with open(folderout + '/' + fn) as myfile:
textual content = myfile.learn()
soup = bs.BeautifulSoup(textual content,'lxml')
textual content = ' '.be a part of([str(a) for a in soup.find_all('p')[2:]])
soup = bs.BeautifulSoup(textual content,'lxml')
for n, i in names_ids.objects():
w = textual content.cut up('Picture Gallery')[0].rely('/' + i)
if w>0:
edge = 't'.be a part of(sorted([name, n]))
if edge not in edges:
edges[edge] = w
else:
edges[edge] += w
len(edges)
As this code block runs, it ought to return round 180 edges.
Subsequent, we use the NetworkX graph analytics library to show the sting record right into a graph object and output the variety of nodes and edges the graph has:
# create the networkx graph from the dict of edges
import networkx as nx
G = nx.Graph()
for e, w in edges.objects():
if w>0:
e1, e2 = e.cut up('t')
G.add_edge(e1, e2, weight=w)G.remove_edges_from(nx.selfloop_edges(G))
print('Variety of nodes: ', G.number_of_nodes())
print('Variety of edges: ', G.number_of_edges())
The output of this code block:
This output tells us that whereas we began with 67 characters, 16 of them ended up not being linked to anybody within the community, therefore the smaller variety of nodes within the constructed graph.
Visualizing the community
As soon as we’ve the community, we will visualize it! First, let’s create a easy draft visualization of the community utilizing Matplotlib and the built-in instruments of NetworkX.
# take a really temporary take a look at the community
import matplotlib.pyplot as plt
f, ax = plt.subplots(1,1,figsize=(15,15))
nx.draw(G, ax=ax, with_labels=True)
plt.savefig('take a look at.png')
The output picture of this cell:
Whereas this community already offers a couple of hints about the principle construction and most frequent traits of the present, we will design a way more detailed visualization utilizing the open-source community visualization software program Gephi. For this, we have to export the community right into a .gexf graph information file first, as follows.
nx.write_gexf(G, 'arcane_network.gexf')
Now, the tutorial on find out how to visualize this community utilizing Gephi:
Extras
Right here comes an extension half, which I’m referring to within the video. After exporting the node desk, together with the community group indices, I learn that desk utilizing Pandas and assigned particular person colours to every group. I bought the colours (and their hex codes) from ChatGPT, asking it to align with the principle coloration themes of the present. Then, this block of code exports the colour—which I once more utilized in Gephi to paint the ultimate graph.
import pandas as pd
nodes = pd.read_csv('nodes.csv')pink = '#FF4081'
blue = '#00FFFF'
gold = '#FFD700'
silver = '#C0C0C0'
inexperienced = '#39FF14'
cmap = {0 : inexperienced,
1 : pink,
2 : gold,
3 : blue,
}
nodes['color'] = nodes.modularity_class.map(cmap)
nodes.set_index('Id')[['color']].to_csv('arcane_colors.csv')
As we coloration the community primarily based on the communities we discovered (communities that means extremely interconnected subgraphs of the unique community), we uncovered 4 main teams, every similar to particular units of characters inside the storyline. Not so surprisingly, the algorithm clustered collectively the principle protagonist household with Jinx, Vi, and Vander (pink). Then, we additionally see the cluster of the underground figures of Zaun (blue), equivalent to Silco, whereas the elite of Piltover (blue) and the militarist implement (inexperienced) are additionally well-grouped collectively.
The wonder and use of such group constructions is that whereas such explanations put it in context very simply, often, it could be very onerous to give you an analogous map solely primarily based on instinct. Whereas the methodology introduced right here clearly exhibits how we will use community science to extract the hidden connections of digital (or actual) social methods, let or not it’s the companions of a legislation agency, the co-workers of an accounting agency, and the HR division of a serious oil firm.