Analysis of time-stamped data using higher-order networks

An educational tutorial using the free python module pyTempNets

Ingo Scholtes
Chair of Systems Design
ETH Z├╝rich

May 2015 (Updated: January 2017)

In this tutorial, I will introduce some basic concepts of the analysis of temporal network data using higher-order networks, which have been introduced in the following recent publications:

  • Y Zhang, A Garas, I Scholtes: Controllability of temporal networks: An analysis using higher-order networks, January 23 2017, arXiv 1701.06331
  • I Scholtes, N Wider, A Garas: Higher-Order Aggregate Networks in the Analysis of Temporal Networks: Path Structures and centralities, The European Physical Journal B, 89:61, March 2 2016, arXiv 1508.06467
  • I Scholtes, N Wider, R Pfitzner, A Garas, CJ Tessone, F Schweitzer: Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal neworks, Nature Communications, 5, Sept. 2014, available online
  • R Pfitzner, I Scholtes, A Garas, CJ Tessone, F Schweitzer: Betweenness preference: Quantifying correlations in the topological dynamics of temporal networks, Phys Rev Lett, 110(19), 198701, May 2013, available online

A key point of these works is that we highlight the importance of non-Markovian characteristics in time-stamped relational data. In a nutshell, we show that the ordering of links in time-stamped data matters. We further introduce higher-order networks, a generalization of the common network perspective that allows to study the influence of order correlations in temporal networks. This new perspective on time-stamped network data provides interesting new opportunities not only to better understand dynamical processes on dynamic networks, but also for the definition of novel information retrieval and community detection algorithms that take into account both the topological and temporal dimension of relational data.

In the following, I illustrate some basic concepts of non-Markovian characteristics in temporal networks. In particular, I will showcase the use of pyTempNet, a free python module which simplies the analysis and visualization of time-stamped relational data, as well as the simulation of dynamical processes on top of temporal networks. All of the methods and concepts introduced in the two publications listed above are fully implemented in pyTempNet thus making it particularly easy to apply them to your data.

Installation of pyTempNets is easy. You will need a working installation of python3 properly set up. You will also have to have the modules numpy, scipy, matplotlib and igraph installed. This can be done easily by installing the Anaconda python 3.4 distribution, which you can get from here. Once you have installed Anaconda, you will just need to additionally set up igraph by invoking the following command:

> pip install python-igraph

You are then ready to install pyTempNet directly from gitHub by typing:

> pip install git+git://github.com/IngoScholtes/pyTempNets.git

After this, you should be ready to run this notebook.

Let us start with a new ipython notebook and let us import some basic modules required in the following tutorial. We first import the basic pyTempNet module as follows:

In [1]:
import pyTempNet as tn

Since time-aggregated networks are internally represented as igraph objects (which we can plot and analyze using standard igraph functions) we will also need to include igraph.

In [2]:
import igraph

Finally, we will want to plot some figures and do some basic calculations using numpy and matplotlib.

In [3]:
import numpy as np
import matplotlib.pyplot as plt

You won't have to care too much about the following imports and definitions, as they will simply allow us to directly embed igraph plots, Latex output and videos into the output of an ipython notebook.

In [4]:
from IPython.display import *
from IPython.display import HTML
import base64

from wand.image import Image
from wand.color import Color

from subprocess import call

def showVideo(filename):
    VIDEO_TAG = """<video controls>
 <source src="data:video/x-m4v;base64,{0}" type="video/mp4">
 Your browser does not support the video tag.
</video>"""
    bytes = open(filename, "rb").read()
    return HTML(VIDEO_TAG.format(base64.b64encode(bytes).decode('utf-8')))

def compileAndShowTex(file):
    call("cd " + file.split('/')[0] + " && pdflatex " + file.split('/')[1], shell=True)
    with Image(filename=file.replace('.tex', '.pdf'), resolution=200) as img:   
        img.resize(400,400)
        img.alpha_channel = True
        with Image(width=img.width, height=img.height, background=Color("white")) as bg:
            bg.composite(img, 0, 0)
            bg.alpha_channel = True
            bg.save(filename=file.replace('.tex', '.png'))
    display(Image(filename=file.replace('.tex', '.png')))

Two simple examples

Let us introduce some basic concepts using two simple examples for temporal networks. We first create an empty temporal network object, which we can then use to add time-stamped links one by one. In each call, we pass the source and target node, as well as an integer time stamp indicating when the link was present.

In [5]:
t1 = tn.TemporalNetwork()
t1.addEdge('a', 'c', 1)
t1.addEdge('c', 'e', 2)

t1.addEdge('b', 'c', 3)
t1.addEdge('c', 'd', 4)

t1.addEdge('b', 'c', 5)
t1.addEdge('c', 'e', 6)

t1.addEdge('a', 'c', 7)
t1.addEdge('c', 'd', 8)

A nice way to visualize such networks are so-called time-unfolded notations, in which we unfold time into a spatial (vertical) dimension. pyTempNet allows you to directly generate tikz code for such a representation. We just call the following function:

In [6]:
tn.Visualizer.exportTikzUnfolded(t1, 'tutorial/t1.tex')

This will produce a LaTeX file which we can simply compile to obtain a PDF figure (this comes in handy for illustrative figures in manuscripts). We can either manually compile the file and convert the resulting PDF figure to a PNG, or we can do this automatically using our little helper function (which requires imagemagick and ghostscript as well as imagemagick's python interface Wand to be installed on your system).

With this helper function, we can easily display the resulting figure in ipython as follows:

In [7]:
compileAndShowTex('tutorial/t1.tex')

In this representation, each of the five nodes is represented by multiple temporal copies. Two subsequent temporal copies are connected whenever a link existed at the respective time stamp. This representation makes it easy to study so-called time-respecting paths.

Apart from time-unfolded representations, pyTempNet can also easily be used to produce videos of temporal networks. For this we can simply export a sequence of PNG movie frames using the following built-in function:

In [8]:
tn.Visualizer.exportMovie(t1,'tutorial\\video.mp4', fps=1)
2016-12-03 13:21:50 [Severity.INFO]	Extracting two-paths for delta = 1...
2016-12-03 13:21:50 [Severity.INFO]	finished.
2016-12-03 13:21:50 [Severity.INFO]	Constructing first-order aggregate network ...
2016-12-03 13:21:50 [Severity.INFO]	finished.
2016-12-03 13:21:50 [Severity.WARNING]	No visual style specified, setting to defaults
2016-12-03 13:21:50 [Severity.INFO]	Encoding video ...
2016-12-03 13:21:51 [Severity.INFO]	finished.

This will internally generate a sequence of numbered frames within a subdirectory called frames. It will then automatically convert the resulting frames to an mp4 video using - for instance - the tool imagemagick. For this, make sure to install imagemagick for your platform.

We can now show the resulting video of the temporal network using the helper function defined above. Just click the play button.

In [9]:
showVideo('tutorial/video.mp4')
Out[9]:

Let us now study the extraction of so-called time-respecting paths in temporal networks. This crucial concept is the temporal equivalent to paths in static network. Different from static networks, links on a time-respecting path have to respect causality, i.e. in order for a path (u,v,t_1) -> (v,w,t_2) of two time-stamped links to connect nodes u and w, the first link has to be present before the second link, i.e. t_1<t_2. pyTempNet can automatically extract time-respecting paths of length two for you. Sometimes, you may also want to include a limit in the waiting time for time-respecting paths. I.e. rather than simply requiring t_1<t_2, you may want to impose that t_2-t_1 <= delta for some maximum waiting time delta.

The pyTempnet function extractTwoPaths can be used to extract time-respecting paths of length two. If no delta is specified, delta = 1 is assumed, i.e. only consecutive links will be considered to contribute to time-respecting paths.

Note that we can specify an arbitrary maximum time difference delta by using the function setMaxTimeDiff.

In [10]:
t1.setMaxTimeDiff(delta=1)
t1.extractTwoPaths()
2016-12-03 13:21:56 [Severity.INFO]	Extracting two-paths for delta = 1...
2016-12-03 13:21:56 [Severity.INFO]	finished.

Depending on your data set, this function may require a few seconds (if you have several hundred thousand or even Millions of time-stamped links). Once it has been completed, we can start analyzing causality structures and non-Markovian characteristics in the temporal network. If you omit the explicit call above, the call will automatically be made whenever time-respecting paths are first needed.

Based on the statistics of time-respecting paths, we can calculate betweenness preference, a measure that captures to what extent nodes mediate time-respecting paths between particular pairs of their neighbours in the static network. Being an information-theoretic measure, it can be interpreted in terms of the number of bits of information that are lost, when we aggregate all time-stamped links around a node (thus obtaining a time-aggregated network which misses the information on the ordering of links). For details on the definition and interpretation of the measure, I have to refer to the publications mentioned above.

With pyTempNet we can calculate the betweenness preference of a node using the following function. In this particular case, the betweenness preference of node c should be zero, as there is no preference for node c to mediate time-respecting paths between any particular pair of nodes (see also temporal network above).

In [11]:
print('I^b(S;D) = ', tn.Measures.BetweennessPreference(t1,'c'))
I^b(S;D) =  0.0

We can now calculate different time-aggregated representations of our temporal network. The simplest and most commonly used one is the plain time-aggregated network, in which all time-stamps are discarded, while link weights indicate the number or frequency of interactions between nodes.

We can obtain an igraph object representing the (first-order) aggregate network as follows. This igraph object can be plotted using basic igraph functions.

In [12]:
# We can compute and plot the first-order aggregate network
g1 = t1.igraphFirstOrder()

visual_style = {}
visual_style["bbox"] = (600, 400)
visual_style["margin"] = 60
visual_style["vertex_size"] = 80
visual_style["vertex_label_size"] = 24
visual_style["vertex_color"] = "lightblue"
visual_style["edge_curved"] = 0.2
visual_style["edge_width"] = 1
visual_style["edge_arrow_size"] = 2

visual_style["layout"] = g1.layout_auto()
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_label"] = g1.es["weight"]

igraph.plot(g1, 'tutorial/t1_G1.png', **visual_style)
display(Image(filename='tutorial/t1_G1.png'))

In one of the papers listed above, we introduced so-called higher-order time-aggregated networks which not only capture the frequency of interactions, but which also capture the ordering of time-stamped links. The idea is that each link in the time-stamped network is represented by a node in the second-order network, while each time-respecting path of length two is represented by a link. Again, details on this abstraction can be found in the papers above.

Here, we can compute (and plot) the second-order network for the temporal network above as follows:

In [13]:
g2 = t1.igraphSecondOrder()

visual_style["layout"] = g2.layout_auto()
visual_style["vertex_label"] = g2.vs["name"]
visual_style["edge_label"] = g2.es["weight"]
igraph.plot(g2, 'tutorial/t1_G2.png', **visual_style)
display(Image(filename='tutorial/t1_G2.png'))
2016-12-03 13:22:04 [Severity.INFO]	Constructing second-order aggregate network ...
2016-12-03 13:22:04 [Severity.INFO]	finished.

In the example above, we see that a total of four time-respecting paths of length two (represented by the four links) exist in the temporal network, each of them occurring exactly one time (thus the link weights of one).

Let us now see what happens if we simply flip the order of two time-stamped links in the temporal network. We do this by defining the following new temporal network (which is identical to the previous one except for one reordering).

In [14]:
t2 = tn.TemporalNetwork()
t2.addEdge('a', 'c', 1)
t2.addEdge('c', 'e', 2)

t2.addEdge('b', 'c', 3)
t2.addEdge('c', 'd', 4)

t2.addEdge('b', 'c', 5)
t2.addEdge('c', 'd', 6)

t2.addEdge('a', 'c', 7)
t2.addEdge('c', 'e', 8)

Again, we can produce a tikz figure showing the time-unfolded representation of this temporal network (which I again manually compiled and converted to a PNG file).

In [15]:
tn.Visualizer.exportTikzUnfolded(t2, 'tutorial/t2.tex')
compileAndShowTex('tutorial/t2.tex')

We now again extract all time-respecting paths of length two and then visualize the first-order time-aggregated network. Not surprisingly, this is exactly the same as before, since we only changed the ordering of time-stamped links.

In [16]:
t2.setMaxTimeDiff(delta=1)
t2.extractTwoPaths()

g1 = t2.igraphFirstOrder()

visual_style["layout"] = g1.layout_auto()
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_label"] = g1.es["weight"]
igraph.plot(g1, 't2_G1.png', **visual_style)
display(Image(filename='t2_G1.png'))
2016-12-03 13:22:15 [Severity.INFO]	Extracting two-paths for delta = 1...
2016-12-03 13:22:15 [Severity.INFO]	finished.
2016-12-03 13:22:15 [Severity.INFO]	Constructing first-order aggregate network ...
2016-12-03 13:22:15 [Severity.INFO]	finished.

As we will see now, there is an important change in terms of causality structures. Now, node c has a preference to mediate time-respecting paths between the pairs of nodes a and e as well as b and d respectively. Furthermore, knowing the "source" of a time-respecting path through c completely determines the target (source a determining target e and source b determining target d). Our uncertainity of two equally likely choices is reduced to one, which corresponds to a betweenness preference of one bit.

In [17]:
print('I^b(S;D) = ', tn.Measures.BetweennessPreference(t2,'c'))
I^b(S;D) =  1.0

We can also output the corresponding (unnormalized) betweenness preference matrix (again see paper for details). The entries provide us with the number of different time-respecting paths through node c. The first row corresponds to node a, the second row corresponds to node b, the first column corresponds to node e, the second column corresponds to node d. Here, the entries reveal that there are two time-respecting paths a -> c -> e (first row) and two time-respecting paths b -> c -> d (second row). Time-respecting paths a -> c -> d and b -> c -> e (off-diagonal zero entries) are absent.

In [19]:
print('I^b(S;D) = ', tn.Measures.BWPrefMatrix(t2,'c'))
I^b(S;D) =  [[ 2.  0.]
 [ 0.  2.]]

The changes in the statistics of time-respecting paths that are due to the reordering of the time-stamped edges is captured by the fact that the second-order aggregate network is different from the one before (even though the first-order aggregate network is the same!).

In [20]:
g2 = t2.igraphSecondOrder()

visual_style["layout"] = g2.layout_auto()
visual_style["vertex_label"] = g2.vs["name"]
visual_style["edge_label"] = g2.es["weight"]
igraph.plot(g2, 'tutorial/t2_G2.png', **visual_style)
display(Image(filename='tutorial/t2_G2.png'))
2016-12-03 13:22:34 [Severity.INFO]	Constructing second-order aggregate network ...
2016-12-03 13:22:34 [Severity.INFO]	finished.

In the second-order network above, we see that now each of the two time-respecting paths (a,c) -> (c,e) and (b,c) -> (c,d)occurs twice, while the theoretically possible time-respecting paths (a,c) -> (c,d) and (b,c) -> (c,e) (which we would actually expect to occur with the same probability than the others) are absent!

You see that this change in the causal structure is exclusively driven by the ordering of links and it is this' phenomenon that we will now study in more realistic scenarios.

Reading from files: A larger synthetic example

We now consider a larger (synthetic) example which we read from a TEDGE file, a simple file format which contains a list of time-stamped links. TEDGE files have the format

source target time a b 4 b c 7 u v 3 u v 3 ...

The separation character (here: space) can be arbitrary and can be specified upon reading. Furthermore, the ordering of columns can be arbitrary and thus a header line indicating which column refers to source, target and timestamp of an edge is required. As you can see in the example above, time-stamped edges do not necessarily have to be ordered according to time. In addition to simple integer time stamps, actual (arbitrary) string timestamps like 2015-05-23 09:23:22 can be used. In this case a format string indicating the meaning of the timestamp has to be included.

Here, we use a simple example of integer time stamps, reading a file of time-stamped edges that were synthetically generated to include order correlations.

In [21]:
t = tn.readFile('data/tutorial/example.tedges', sep=' ')
2016-12-03 13:23:24 [Severity.INFO]	Reading trigram data ...
2016-12-03 13:23:24 [Severity.INFO]	finished.
2016-12-03 13:23:24 [Severity.INFO]	Building index data structures ...
2016-12-03 13:23:24 [Severity.INFO]	finished.
2016-12-03 13:23:24 [Severity.INFO]	Sorting time stamps ...
2016-12-03 13:23:24 [Severity.INFO]	finished.

We can actually print a summary of its basic properties by using the Summary() function.

In [22]:
print(t.Summary())
Nodes:			30
Time-stamped links:	13200
Links/Nodes:		440.0
Observation period:	[0, 13199]
Observation length:	13199
Time stamps:		13200
Avg. inter-event dt:	1.0
Min/Max inter-event dt:	1/1
Max Time Diff (delta):	1
Two-paths:		not calculated
First-order network:	not constructed
Second-order network:	not constructed

Let us now extract time-respecting paths of length two and visualize the first-order aggregate network.

In [23]:
t.setMaxTimeDiff(delta=1)
t.extractTwoPaths()

g1 = t.igraphFirstOrder()

visual_style = {}
visual_style["bbox"] = (600, 600)
visual_style["margin"] = 60
visual_style["edge_width"] = [x/100 for x in g1.es()["weight"]]
visual_style["vertex_size"] = 20
visual_style["vertex_label_size"] = 12
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_arrow_size"] = 0.3
visual_style["layout"] = g1.layout_auto()

igraph.plot(g1, 'tutorial/example_g1.png', **visual_style)
display(Image(filename='tutorial/example_g1.png'))
2016-12-03 13:23:29 [Severity.INFO]	Extracting two-paths for delta = 1...
2016-12-03 13:23:29 [Severity.INFO]	finished.
2016-12-03 13:23:29 [Severity.INFO]	Constructing first-order aggregate network ...
2016-12-03 13:23:29 [Severity.INFO]	finished.

Importantly, the weighted time-aggregated network does not show any specific structure. However, due to the specific ordering of time-stamped edges in the temporal network, the second-order aggregate network can look completely different, meaning that only very specific time-respecting paths actually occur. Here, the second-order network shows three pronounced temporal communities that are not visible in the first-order network.

In [24]:
g2 = t.igraphSecondOrder()

visual_style = {}
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 5

igraph.plot(g2, 'tutorial/example_g2.png', **visual_style)
display(Image(filename='tutorial/example_g2.png'))
2016-12-03 13:23:34 [Severity.INFO]	Constructing second-order aggregate network ...
2016-12-03 13:23:34 [Severity.INFO]	finished.