Experimental Evaluation of Distributed k-Core Decomposition

Experimental Evaluation of Distributed k-Core

Decomposition

Bin Guo

Department of Computing & Information Systems

Trent University

Peterborough, Canada

[email protected]

Runze Zhao

Department of Computing & Software

McMaster University

Hamilton, Canada

zhaorz@mcmaster.ca

Abstract—Given an undirected graph, the k-core is a subgraph

in which each node has at least k connections, which is widely

used in graph analytics to identify core subgraphs within a

larger graph. The sequential k-core decomposition algorithm

faces limitations due to memory constraints and data graphs can

be inherently distributed. A distributed approach is proposed to

overcome limitations by allowing each vertex to independently do

calculation by only using local information. This paper explores

the experimental evaluation of a distributed k-core decomposition

algorithm. By assuming that each vertex is a client as a single

computing unit, we simulate the process using Golang, leveraging

its Goroutine and message passing. Due to the fact that the

real-world data graphs can be large with millions of vertices,

it is expensive to build such a distributed environment with

millions of clients if the experiments run in a real-life scenario.

Therefore, our experimental simulation can effectively evaluate

the running time and message passing for the distributed k-core

decomposition.

Index Terms—k -core decomposition, distributed, Golang, mes-

sage passing

I. INTRODUCTION

Graphs are fundamental data structures to model real appli-

cations, which are mathematical representations of relation-

ships between objects such as individuals, knowledge, and

positions. In the graph, each vertex represents an object and

each edge represents some relationship between a pair of

objects, for example, the social network of Facebook can be

represented as graph data where each user is a vertex and the

relationships between users are edges.

Since many real-world applications can be modeled as

graphs, graph analytics has attracted much attention from

both research and industry communities. Many algorithms

are proposed to analyze large data graphs, including graph

trimming, strong connected component (SCC) decomposition,

k-core decomposition, k-truss decomposition, etc.

Among all the above algorithms, the k-core decomposition

is to analyze the structure of a graph by identifying its core

subgraphs. The k-core of a graph is a subgraph in which each

node has at least k connections; the core number of nodes is

the highest order of a subgraph in which every vertex has at

least a speciﬁed degree.

There are various applications [1] for the k-core decompo-

sition listed below:

• Biology. The k-core and phylogenetic analysis of Protein-

Protein Interaction network help predict the feature of

functional-unknown proteins.

• Social Network. The k-core decomposition is widely

accepted to reveal the network structure. It can be used

to identify the key nodes in the network or to measure

the inﬂuence or users in online social networks.

• Compute Sciences. The k-core decomposition can be used

to study large scale Internet graphs. It easily reveals

ordered and structural features of the networks. The

k-core subgraphs also reveals the primary hierarchical

layers of the network and also permits their analytical

characterization.

The widely used sequential algorithm for the k-core de-

composition is proposed by Batagelj and Zaversnik, the so-

called BZ algorithm [2]. It recursively removes vertices (and

incident edges) with degrees less than k. The algorithm uses

bucket sorting and can run in O(m + n) time, where m is

the number of edges and n is the number of vertices. The

sequential BZ algorithm has two limitations:

• First, the graph must be loaded into memory, since the

BZ algorithm requires random access the whole graph

during computation. So, some graphs may be too large

to ﬁt in a single host due to memory restriction.

• Second, the graph can be inherently distributed over a

collection of hosts and each host hold a partial subgraph

(one-to-many), which is not convenient to move each

portion to a central host. Furthermore, each vertex can be

a independent host and the edges can be the connections

between different hosts (one-to-one), e.g., mobile phone

networks.

To overcome the above limitations, distributed graph algo-

rithms are proposed in [3]. In this paper, we focus on using

the one-to-one model such that each vertex in the graph is

a single host that calculates its own k-core simultaneously,

which can be compatible with the one-to-one model. Hence,

tiny memory and computing resources are required for each

vertex, and there is no central host to store the entire graph.

Real-world graphs are large and always have millions of

vertices and edges. It is expensive to build a distributed physi-

cal environment that has millions of clients to simulate vertices

one-to-one, in order to test the distributed algorithms. Instead,

arXiv:2406.17580v1 [cs.DC] 25 Jun 2024

we can simulate the distributed algorithm on a single machine

by choosing a highly concurrent programming language that

supports lightweight threads and massage passing, e.g. Golang.

In this paper, we are going to simulate the distributed k-core

decomposition algorithm to explore its runtime behaviors. The

experiment is conducted with Golang to simulate distributed

runtime environment. In our experiment, we used real-world

data graphs, and the size of data graphs varies from thousands

to millions of vertices. The core number of each vertex is

calculated, and we capture the number of messages passed

between each vertex for analysis.

The distributed k-core decomposition algorithm is explained

in Section II. The details of the implementation are demon-

strated in Section III. Our experiments are described in Sec-

tion IV.

II. PRELIMINARIES

In this section, we review the distributed k-core decompo-

sition algorithm.

Let G = (V, E) be an undirected unweighted graph,

where V (G) denotes the set of vertices (or nodes) and

E(G) represents the set of edges in G. When the context

is clear, we will use V and E instead of V (G) and E(G)

for simplicity, respectively. As G is an undirected graph, an

edge (u, v) ∈ E(G) is equivalent to (v, u) ∈ E(G). We

denote the number of vertices and edges of G by n and m,

respectively. The set of neighbors of a vertex u ∈ V is deﬁned

by u.adj = {v ∈ V : (u, v) ∈ E}. The degree of a vertex

u ∈ V is denoted by u.deg = |u.adj|.

A. The k-Core Decomposition

Deﬁnition II.1 (k-Core [4]). Given an undirected graph G =

(V, E) and a natural number k, an induced subgraph G

of G

is called a k-core if it satisﬁes: (1) for ∀u ∈ V (G

), u.deg ≥

k, and (2) G

is maximal. Moreover, G

k+1

⊆ G

, for all

k ≥ 0, and G

is just G.

Deﬁnition II.2 (Core Number [4]). Given an undirected graph

G = (V, E), the core number of a vertex u ∈ G(V ), denoted

u.core, is deﬁned as u.core = max{k : u ∈ V (G

)}. That

means u.core is the largest k so that there exists a k-core

containing u.

Deﬁnition II.3 (k-Core Decomposition [4]). Given a graph

G = (V, E), the problem of computing the core number for

each u ∈ V (G) is called k-core decomposition.

Example II.1. Figure 1 demonstrates an example of k-core

decomposition. In the graph, the circles are the nodes, and the

inside letters A, B, C, D, E, F, G, H are their ID. The black

lines that connect two nodes are called edges. The edges are

undirected, and each node is aware of the connected neighbors.

The number of neighbors is the degree of the node.

The three dot circles demonstrate the k-core of this graph.

As mentioned, the k-core is a maximal subgraph in which

every node has at least degree k. The whole is 1-core since

all nodes have degree at least 1. The nodes A, B, E, F, G, H

Fig. 1: An Example of k-Core Decomposition

form the subgraph that all the nodes in this subgraph have

degree at least 2, which is 2-core. The subgraph contains nodes

A, B, E, F is 3-core because all nodes in this subgraph have

degree at least 3.

The core number of a node is the largest k such that this

node is part of the k-core. In this example, for node A, it is

part of 1-core, 2-core, and 3-core, so the largest k is 3 and

the core number is 3. Similarly, nodes B, E, F have a core

number of 3; nodes G, H have a core number of 2 and nodes

C, D have core numbers 1.

The process of computing the core number of each node in

this graph is called k-core decomposition.

B. Distributed Core Decomposition

Theorem II.1 (Locality [3]). For all u ∈ V , its core number,

u.core, is the largest value k such that u has at least k

neighbors that have core numbers not less than k. Formally,

we deﬁne u.core = k, where k ≤ |{v ∈ u.adj : v.core ≥ k}|

and (k + 1) > |{v ∈ u.adj : v.core ≥ (k + 1)}|.

Theorem II.1 shows that the vertex u is sufﬁcient to cal-

culate its core number from the neighbor’s information. The

procedure works as in the following steps [3]:

1) For the initialization stage, each u ∈ V has its esti-

mate core numbers initialized as its degree. Each u will

maintain a set of estimated core numbers u.core

′

for all

neighbours u.adj . Then, each u sends its estimated core

numbers u.core

′

to all neighbours.

2) Then u will receive the estimated core numbers v.core

′

from its neighbours v ∈ u.adj. The received core

′

will

be stored by u locally. Then, u will wait until receiving

estimated core numbers from all neighbours before it

calculates its own core number k. If k is not equal to

u.core

′

, u will update its core number with Theorem II.1

and send the updated core number to all neighbours.

3) Each vertex u ∈ V executes the above distributed

algorithm in parallel. The termination condition is that

all vertices u ∈ V satisfy Theorem II.1 and thus stop

decreasing the estimated core numbers u.core

′

at the

same time.Finally, all vertices obtain the calculated core

numbers.

a) Message Complexity: The performance of the dis-

tributed k-core decomposition algorithm can be measured

using time complexity or message complexity [3]. The time

complexities are used to measure the total running time, while

the message complexities are used to measure the number

of messages passed between different nodes to complete the

k-core decomposition. The number of messages that pass

through the go channels can be recorded during the program

run time. It will not interfere with running the program locally

or on a real distributed network. For distributed algorithms,

since most of the running time is spent on the message pass-

ing through networks, which is much slower than accessing

memory, we should mainly analyze the message complexities.

We analyze the message complexities in the standard

work-depth model [5]. The work, denoted as W, is the

total number of operations that the algorithm uses. The

depth, denoted as D, is the longest chain of sequential

operations. The work W is the total number of messages

that the degree reduces to the core number, denoted as

W = O[

u∈V

u.deg · (u.deg − u.core)], since the vertex

must send messages to notify all neighbors when each time

its core number decreases by one.

In the worst case, the process can be reduced to sequential

running, e.g., a chain graph. In other words, the whole process

needs the worst-case n round to converge. Therefore, the depth

D is equal to the work W. However, in real graphs, e.g. social

networks and communication networks, such a worst-case can

rarely happen, and it has a high probability to run in parallel,

as such chain structure rarely exists in most of real graphs.

Normally, it takes only several rounds, such as 1 to 10, to

converge.

In this paper, our experiments are to evaluate the total

number of messages passed during the process of k-core

decomposition. More importantly, we will analyze the number

of messages passed over different time intervals as well as how

soon each node completes the k-core decomposition to obtain

how distributed k-core decomposition behaves over time.

C. Termination Detection

Concurrent programming involves multiple processes exe-

cuting simultaneously, often leading to complex interactions.

Termination detection is a critical aspect of concurrent pro-

gramming, ensuring that processes ﬁnish execution properly

without deadlocks or inﬁnite loops. There are several well-

developed termination detection algorithms:

• Chandy-Lamport Snapshot Algorithm [6]: It operates by

taking snapshots of the local states of processes and the

states of communication channels.

• Mattern’s Algorithm [7]: It is an improvement over the

Chandy-Lamport algorithm, using vector clocks to track

causal relationships between events.

• Lai-Yang Algorithm [8]: This algorithm is another ap-

proach that leverages colored markers (white and red) to

capture global states without requiring F IF O channels.

• Dijkstra-Scholten Algorithm [9]: This algorithm uses a

hierarchical tree structure. The coordinator process at the

root collects messages from child processes, which in turn

collect messages from their children, and so on.

In this paper, for simplicity and efﬁciency, we use a cen-

tralized server for termination detection. That is, all clients

send heartbeat to the server, so the server can terminate the

algorithm when the heartbeat is not received for a period.

III. IMPLEMENTATION

In this section, we discuss the implementation of the dis-

tributed k-core decomposition algorithm.

A. Procedures

Our implementation contains the following procedures:

• receive: Each node continuously runs the receive func-

tions until the termination. This function receives the

incoming message from the neighbors and calculates

node’s k-core.

• send: Once the receive function calls this function to send

k-core to neighbors after calculating the node’s k core

number.

• updateCore: The receive function calls this function to

calculate the node’s k-core number after receiving the

k-core number from its neighbors.

• sendHeartBeat: Each node uses this function to send

heartbeat to the central server.

• receiveHeartBeat: The central server continuously runs

this function to receive hearbeat messages from all nodes

and determine whether the termination signal should be

issued.

• dataCleanse: This function processes the graph data to

make it usable for the simulation.

B. Golang Simulation

Golang, also known as Go, is a compiled programming

language developed by Google

. Go has built-in support for

concurrent programming through Goroutines and channels.

As lightweight threads, Goroutines are multiplexed onto a

small number of Operating System threads and are auto-

matically scheduled by the Go runtime system. Goroutines

enable concurrent programming in Go, allowing functions

to be executed concurrently, independently of other parts

of the program. The study [10] shows that concurrency in

Go is easier to implement and has better performance than

Java. The Go channel is a powerful feature that facilitates

communication and synchronization between Goroutines. The

channels provide a way for Goroutines to send and receive

data to and from each other safely and efﬁciently. Channels

https://go.dev/doc/

can also be used to synchronize the execution of Goroutines.

For example, a Goroutine may wait until it receives a signal

from another Goroutine through a channel before proceeding

with its execution.

a) Why Choose Golang: In this paper, we try to simulate

the distributed k-core decomposition algorithm. Each vertex is

a computational unit, which can be simulated as a lightweight

thread. Since there can be millions of vertex in a tested data

graph, our simulation has to execute millions of lightweight

threads in parallel. In addition, vertices communicate by

passing messages to each other, which can be simulated as the

message passing between lightweight threads. In a word, our

simulation experiments require a programming language that

efﬁciently supports a large number of concurrent lightweight

threads that can be synchronized by message passing.

In addition to Go, there exist many other programming

languages that support concurrent lightweight threads and

message passing, such as Erlang

, Haskell

, Elixir

, and Rust

However, Go well supports a large number of lightweight

threads called Goroutines running concurrently in parallel, and

message passing through channels for each Goroutine. We

can start a Goroutines simply by a statements ”go func”,

which is the most convenient compared to other programming

languages.

The tested data graphs contain up to millions of vertices.

If the experiment runs in a real-life scenario, each vertex

would require a single physical client to run the program,

and all clients would communicate through public networks.

It is hard to have such a large resource to carry out such

experiments. The simulation of such an experiment requires

strong concurrency and message-passing capabilities, where

Go excels. Each physical client can be simulated using a

single Goroutine, and the communication of all nodes can

be synchronized by sending and receiving messages through

channels. By such a simulation, we can run the experiments on

a single machine, without a large number of physical machines

and setting up a complex running environment.

C. Algorithm Implementation

a) Data Structure: Following is the implementation of

node data structure. Each node should maintain such data

structure and updated by exchange information with their

neighbours;

type Node struct {

ID string

coreNumber int

storedNeighborK map[string] int

status string

selfChan chan sendMsg

serverChan chan string

terminationChan chan bool

https://www.erlang.org/docs

https://www.haskell.org/

https://elixir-lang.org/

https://www.rust-lang.org/

neighbors []chan sendMsg

}

• ID: Each node has its own unique id as an identiﬁer.

• coreNumber: Each node stores its own core number.

• storedNeighborK: Each node stores its neighbours’ core

number.

• status: Each node has two status:

– Active: The node is actively calculating its k-core

number and passing messages to its neighbours

– Inactive: The node is currently not processing

passed messages nor calculating its k-core number.

• selfChan: Each node has its own channel to receive

messages from neighbour nodes.

• serverChan: The channel used by server to receive heart-

beat from all nodes.

• terminationChan: Each node has a channel to receive

termination message from server.

• neighbors: Each node stores its neighbour channels.

(a) Initially, k-core numbers are degrees

(b) First round, each node sends its k-core number to neighbors

Fig. 2: An Example of distributed k-core decomposition with

message passing (a)

b) Message Passing:

(d) Final round, all nodes finish k-core decomposition

Fig. 3: An Example of distributed k-core decomposition with

message passing (b)

Example III.1. Fig. 2 and Fig. 3 show the distributed

k-core decomposition procedure with a small example

graph. The circles are the nodes, and the inside letters

A, B, C, D, E, F, G, H are their ID. The Active nodes are

colored red, the Inactive nodes are colored green, and

the arrows between the nodes demonstrate how messages are

passed. The beside table indicates the data structure stored in

each node. The message contains the sender’s ID and core

number, for example, the message {A, 2} in which A is the

ID and 2 is the core number.

• Initially, as shown in Fig. 2(a), all nodes are initialized

with degree number as their coreN umber; all nodes

are aware of their neighbors but do not know the

coreNumber of their neighbors

• In the ﬁrst round, as shown in Fig. 2(b), all nodes send

their coreNumber to all neighbors. Nodes A, D, and

E do not need to decrease their coreNumber. Nodes B

and C decrease their coreNumber to 2 according to the

distributed k-core decomposition algorithm.

• In the second round, as shown in Fig. 3(c), nodes B and

C send a new coreNumber to their neighbors.

• Finally, as shown in Fig. 3(d), all nodes have up-

dated coreNumber from nodes B and C; no nodes

needs to update their coreN umber; all nodes enter the

Inactive status.

c) Termination Detection: In order to reduce the re-

sources and number of passing messages used for termina-

tion detection, we use the centralized termination detection

approach [3]. We use a central server process and a dedicated

channel called server channel to directly collect messages

from all nodes. This is a classic Master-Worker Paradigm

[11] design, which involves a master node (central server) that

coordinates the actions of the worker nodes (processes), which

reports their status to the master. This concept is closely related

to centralized monitoring and failure detection in distributed

systems. It embodies the principles of centralized coordination

[12] and monitoring in distributed systems. This approach is

easier to implement and lightweight on each node. Each node

does not need to store or pass messages from other nodes.

• Initially, all nodes start with Active status since they

will need to send their degree numbers to neighbours.

• Once the Active nodes complete the k-core calculation

using distributed k-core decomposition algorithm and

send the k-core to their neighbours, they change the status

to Inactive.

• The Inactive nodes can turn into Active status as

soon as receiving updated core number from its neigh-

bours because updated core number from neighbour will

trigger a recalculation of its own core number.

The message sent to the server is called a heartbeat. It is

a message that each node sends to the server to inform its

status. Only Active nodes generate and send heartbeats to

the server. Inactive nodes do not send heartbeat messages

to the server because the server only needs to know if there are

still Active nodes in the system. If there exist any Active

nodes, the server will not send a termination notice to all

nodes.

All nodes send heartbeat message to server in either of

following conditions:

• When node receives an updated k-core number from its

neighbour, it sets its status to Active and immediately

send a heartbeat to the server.

• All Active nodes send heartbeats to the server every 10

seconds. If the k-core calculation takes too long, the node

will use this periodic heartbeat to inform the Active

status to the server. This prevents false termination from

the server in the event that no heartbeat is received but

there are nodes still calculating k-core numbers.

The server Goroutine constantly reads messages from the

server channel and checks if there are any incoming heartbeats

in the past 30 seconds. If there is no incoming heartbeat

from any node for 5 minutes, the server will send termination

message to all nodes’ termination channels to stop all nodes.

The k-core number stored on each node is considered the

ﬁnal result. The 30-second interval is much longer than the

heartbeat interval, which is only 10 seconds. This is done to

ensure that there is no pending message in the system or that

any node takes too long to calculate the core numbers. The

downside of using a long check interval is that termination is

delayed and that the system will not terminate as soon as it

ﬁnishes calculating the core numbers.

IV. EXPERIMENTS

In this section, we carry out extensive experiments to

evaluate the message complexity of the distributed k-core

decomposition algorithm with the simulation results for each

data graph. The analysis contains the number of total passing

messages, the number of passing messages over time interval,

and the number of Active nodes over time interval.

A. Experiment Setup

The experiments are performed on a server with an AMD

CPU (64 cores, 128 hyperthreads, 256 MB of last-level

shared cache) and 256 GB of main memory. The server

runs the Ubuntu Linux (version 22.04) operating system. The

distributed k-core decomposition algorithm is implemented in

Golang (version 1.21). The code implementation is shared on

the Github repository

B. Tested Graphs

We select 14 real-world graphs within seven categories. All

graphs are obtained from the Stanford Large Network Dataset

Collection (SNAP)

, shown in Table I. The detailed graph

descriptions are summarized as follows:

• soc-pokec-relationships(SPR): the most popular on-line

social network in Slovakia.

• musae-PTBR-features(PTBR): twitch user-user networks

of gamers who stream in a certain language.

• facebook-combined(FC): This dataset consists of ’circles’

(or ’friends lists’) from Facebook.

• musae-git-features(MGF): a large social network of

GitHub developers that was collected from the public API

in June 2019.

• soc-LiveJournal1(LJ1): a free on-line community with

almost 10 million members; a signiﬁcant fraction of these

members are highly active.

• email-Enron(EEN): enron email communication network

covers all the email communication within a dataset of

around half million emails.

• email-EuAll(EEU): the network was generated using

email data from a large European research institution.

• p2p-Gnutella31(G31): the network was generated using

email data from a large European research institution.

• com-lj(CLJ): LiveJournal friendship social network and

ground-truth communities.

• com-amazon(CA): Ground-truth community deﬁned by

product category provided by Amazon.

https://github.com/Marcus1211/MEng

https://snap.stanford.edu/data/

• web-Stanford(WS): the nodes represent pages from Stan-

ford University (stanford.edu), and the directed edges

represent the hyperlinks between them.

• web-Google(WG): nodes represent web pages and di-

rected edges represent hyperlinks between them

• amazon0505(A0505): the network was collected by

crawling Amazon website.

• soc-Slashdot0811(S0811): the network contains friend/-

foe links between the users of Slashdot.

Each graph is pre-processed before the experiment. For

simplicity, all directed graphs are converted to undirected

graph based on following rules:

• A vertex can not connect to itself

• Each pair of vertices can only connect with one edge

• All graphs are converted into JSON format that the key

is the vertex and all its neighbour vertices are stored in

the value

a) Graph Properties: In Table I, we can see that the

tested graphs have up to millions of edges. Their average

degrees range from 2 to 46, and their maximal core numbers

range from 8 to 376. Each column in Table1.1 is explained

below:

• Type: directed and undirected. Directed graphs contain

directed edges, which means that connected nodes are

not mutually aware of each other. Undirected graphs

have edges that do not have a direction. Directed graphs

must be converted into undirected graphs before the

experiment, simply by removing the direction.

• n = |V |: the number of vertices in the graph.

• m = |E|: the number of edges in the graph.

• AvgDeg (Average Degree): The number of neighbors of

a node is called degree; the average degree is the sum of

all degrees divided by the total number of nodes.

• MaxDeg (Max Degree): the maximum degree among all

nodes.

• MaxCore: the maximum core numbers among all nodes

after k-core decomposition.

b) Core Number Distribution: In Fig. 4, we can see that

the core numbers of vertices are not uniformly distributed in

all the graphs tested, where the x-axis is core numbers and

the y-axis is the number of vertices. That is, a great portion

of vertices have small core numbers, and few have large core

numbers. For example, LJ has 0.5 million vertices with a core

number of 1; PTBR and MGF have no vertices with a core

number of 1. Although CA has more than 300, 000 nodes, all

the core number of vertices range from 0 to 8.

C. Evaluate Total Number of Messages

Fig. 5 shows the total number of passing messages for

each graph, where the x-axis is the tested graphs and y-

axis is the total number of passed messages for each graph

to complete the distributed k-core decomposition. Generally,

large graphs, for example WG, EEN, and A0505, with more

nodes and edges, require more messages passed to complete

the simulation.

Category Graph Name Type n = |V | m = |E| AvgDeg MaxDeg MaxCore

Social Networks

soc-pokec-relationships (SPR) Directed 1,632,803 30,622,564 29 14739 118

musae-PTBR-features (PTBR) Undirected 1,912 31,299 24 1635 21

facebook-combined (FC) Undirected 4039 88234 46 986 118

musae-git-features (MGF) Undirected 37,700 289,003 36 28191 29

soc-LiveJournal1 (LJ1) Directed 4,847,571 68,993,773 19 20314 376

Communication Networks

email-Enron (EEN) Undirected 36,692 183,831 10 1383 49

email-EuAll (EEU) Directed 265,214 420,045 2 7631 44

Internet peer-to-peer networks p2p-Gnutella31 (G31) Directed 62,586 147,892 7 68 9

Ground-Truth Communities

com-lj (CLJ) Undirected 3,997,962 34,681,189 25 14208 360

com-amazon (CA) Undirected 334,863 925,872 5 546 8

Web Graphs

web-Stanford (WS) Directed 281,903 2,312,497 14 38625 75

web-Google (WG) Directed 875,713 5,105,039 10 6331 44

Product Co-Purchasing Networks amazon0505 (A0505) Directed 410,236 3,356,824 12 2760 15

Signed Networks soc-Slashdot0811 (S0811) Directed 77,357 516,575 13 2540 59

TABLE I: Tested Data Graphs

Fig. 4: The Core Number Distribution

However, the average degree can also affect the number of

messages passed, even when the number of nodes is small. For

example, MGF has 37, 700 nodes with an average degree of

36 and FC has only 4039 nodes, but with an average degree of

46. Both MGF and FC are considered small graphs but require

a large number of messages to complete the simulation. The

high average degree means that each node has more neighbors

on average. When a node updates its core number, it needs to

send more messages if it has more neighbors, and thus more

messages are passed. The number of messages passed does

not increase in liner to the average degree. It also depends on

Fig. 5: The Total Number of Passing Messages

the core distribution among all nodes.

D. Evaluate the Number of Message Passing with Time Inter-

val

Fig. 6 and Fig. 7 show the number of messages passed in the

system over time interval, where the x-axis is the time interval

when each data point is collected and the y-axis is the number

of total messages passed between all nodes when simulating

the distributed k -core decomposition algorithm. Since different

graphs take different times to complete the experiment; Some

large graphs take up to 3 days to complete, we only use time

Fig. 6: Number of Passing Messages over Time Interval (a)

Fig. 7: Number of Passing Messages over Time Interval (b)

intervals instead of real time to plot the graph. This helps

demonstrate the general trend of each simulation despite the

time it takes.

We can observe that most of the messages are passed in

the ﬁrst couple of time intervals. This is expected as all nodes

need to share their degree number with their neighbor when

initiating the core decomposition. Every time a node updates

its core number, it needs to pass messages to its neighbors;

most of the nodes remain in the active state and keep updating

their core number at the beginning of the simulation. Hence

a large mount of messages is passed during the ﬁrst couple

of time intervals. The number of messages passed decreases

for all graphs as the simulation progresses because more

nodes ﬁnish calculating its core number. Therefore, no more

messages are sent from the inactive node.

The graph WG demonstrates a special case in which there is

a spike in the number of messages passing during the middle of

the simulation. This is due to the core calculation of nodes that

have many neighbors; when they update their core number,

they send messages to a large number of neighbors, which

causes neighbors to update their core numbers as well. Hence,

a recursive effect results in a spike of messages passed.

E. Number of Active Node for Each Time Interval

Fig. 8: Number of Active Node over Time Interval (a)

Fig. 8 and Fig. 9 show the number of Active nodes

over time, where the x-axis is the time interval when each

data point is collected and the y-axis is the number of total

messages passed between all nodes. Most of the nodes are

in the Active state at the beginning of the simulation. As

Fig. 9: Number of Active Node over Time Interval (b)

stated in the implementation, the node will only enter the

Active state when it needs to decrease its core number

and send messages to its neighbor. Hence, more and more

nodes enter the Inactive state once they ﬁnish calculating

its core number and do not receive messages from its neighbor

to trigger further core number calculation. The number of

Active nodes decreases at different rates despite running on

the same machine. This is caused by the distribution of nodes

with different core numbers. For some graphs such as A0505

or EEN, most of the nodes have small core numbers, so they

will be processed quickly and enter the Inactive state. On

the other hand, graphs like CA or MGF, they have more nodes

with a higher core number; it will take some time for them

to process these nodes; hence the number of Active nodes

does not drop rapidly at the beginning of the simulation.

The speed of the distributed k-core decomposition algorithm

is determined by the number of Active nodes remaining

in the experiment. As the experiment progresses, more and

more nodes turn into the Inactive state and eventually all

nodes become Inactive. Nodes with a small core number

always turn into Inactive ﬁrst, which means if the speed

of the algorithm is determined by the core number distribution

among all nodes, e.g. if most of the nodes have a high core

number, it would take longer time for the distribute k-core

decomposition to ﬁnish.

F. Evaluate Total Running Time

Fig. 10 shows the total running time for each graph, where

the x-axis is the tested graphs and y-axis is the total running

time in minutes for each graph to complete the distributed k-

Fig. 10: Total Running Time

core decomposition. The results show a similar trend similar

compared with the number of passing message in Fig. 5, which

means that it takes a large portion of running time for passing

messages during the computation.

However, Fig. 10 cannot be used as a reference for the

real-world implementation of the distributed algorithm. If the

distributed k-core decomposition algorithm is implemented in

the real world, all messages will be sent over the Internet.

The geographical locations of the nodes will cause various

delays in transferring messages. Our experiment leverages Go

Channels and computer memory, which generate less delay

than the Internet over the smaller size of the tested graphs

but more delays than the Internet over the larger size of the

tested graphs. Therefore, as pointed out in Section II, the

performance of the distributed k core decomposition algorithm

is evaluated based on the complexity of the message rather

than the complexity of time.

V. CONCLUSION AND FUTURE WORK

This paper presents an experimental evaluation of the dis-

tributed k-core decomposition algorithm on real-world graphs

with up to millions of vertices. The algorithm is able to

calculate the core number for each node without shared

memory. Golang is used to simulate the distributed runtime

environment, and we can analyze the number of messages

passing when executing the algorithm.

In the future, our experimental evaluation can extend to

other distributed graph algorithms, e.g., k-truss decomposition

and SCC decomposition. In addition, instead of using Golang,

we can invent a speciﬁc framework to simulate distributed

algorithms, which supports the simulation of accurate latency

for passing messages.

REFERENCES

[1] Y.-X. Kong, G.-Y. Shi, R.-J. Wu, and Y.-C. Zhang, “k-core: Theories

and applications,” tech. rep., 2019.

[2] V. Batagelj and M. Zaversnik, “An o(m) algorithm for cores decompo-

sition of networks,” CoRR, vol. cs.DS/0310049, 2003.

[3] A. Montresor, F. De Pellegrini, and D. Miorandi, “Distributed k-core

decomposition,”

[4] B. Guo and E. Sekerinski, “Simpliﬁed algorithms for order-based core

maintenance,” The Journal of Supercomputing, pp. 1–32, 2024.

[5] J. J

e, An introduction to parallel algorithms. Reading, MA: Addison-

Wesley, 1992.

[6] K. M. Chandy and L. Lamport, “Distributed snapshots: Determining

global states of distributed systems,” ACM Transactions on Computer

Systems (TOCS), vol. 3, no. 1, pp. 63–75, 1985.

[7] F. Mattern, “Algorithms for distributed termination detection,” Dis-

tributed computing, vol. 2, no. 3, pp. 161–175, 1987.

[8] T. H. Lai and T. H. Yang, “On distributed snapshots,” Information

Processing Letters, vol. 25, no. 3, pp. 153–158, 1987.

[9] E. W. Dijkstra and C. S. Scholten, “Termination detection for diffusing

computations,” Information Processing Letters, vol. 11, no. 1, pp. 1–4,

1980.

[10] N. Togashi and V. Klyuev, “Concurrency in go and java: Performance

analysis,” in 2014 4th IEEE International Conference on Information

Science and Technology, pp. 213–216, 2014.

[11] M. Van Steen and A. Tanenbaum, “Distributed systems principles and

paradigms,” Network, vol. 2, no. 28, p. 1, 2002.

[12] G. F. Coulouris, J. Dollimore, and T. Kindberg, Distributed systems:

concepts and design. pearson education, 2005.