MinerLC/MinerLSD: closed pattern miners in attributed graphs

Dominique Bouthinon, Guillaume Santini, Henry Soldano

Figure 2-2 HA core



Overview

Enumerating abstract closed patterns and bi-patterns

The minerLC software aims at enumerating the abstract closed patterns in an attributed graph.

In attributed graphs addressed by minerLC, a vertex is described as an itemset, i.e. a subset of a set of items. Whenever the vertex labels are not itemsets, some discretization has then to be performed as a pre-processing step.

In standard closed itemset mining, a closed pattern is the largest itemset common to an object subset. When the object are the vertices of a graph, such a vertex subset induces a subgraph. In abstract closed pattern mining, this subgraph is reduced to its core subgraph, according to some core definition, as for instance, the k-core defined by [Seidman1983]. An abstract closed pattern (also called a core closed pattern) is then the largest pattern common to the vertices of the core subgraph derived from some vertex subset.

Technically, when considering some itemset q, to obtain an abstract closed pattern we build the vertex subset ext (q) in which q occurs, then compute the core p(ext(q)) of the subgraph induced by ext(q) using the core operator p. Finally, we compute the associated abstract closed pattern by applying the intersection operator int. To sumarize, to any itemset q we associate:

The general idea is that by considering core subgraphs we only consider cohesive subnetworks together with the items shared by their vertices, This is exemplified in the small example that follows in which in the core subgraph all vertices have to belong to a triangle:

Figure 2-2 HA core

This idea has been adaptated to directed graphs, by intoriducing an appropriate core, namely the hub-authority core, and two-mode networks. In the latter case, the two vertex sets V1 and V2 (representing for instance respectively actors and movies) are described according to two different sets of items, and this leads to define bi-patterns. An abstract closed bi-pattern is then an itemset pair (q1, q2) such that q1 is the largest itemset common to all vertices from V1 in the core subgraph and q2 is the largest itemset common to all vertices from V2 in the core subgraph.

Figure 2-2 HA core

Interestingly we may also have bi-patterns in directed networks and even in undirected netwroks by defining a core as a vertex subset pairs. This allows for example to look, in the LAWYERS network presented below, to a core subgraph in which young lawyers from Boston ask for advice from older lawyers: components.

Ranking abstract closed patterns and bi-patterns

Abstract closed patterns (and bi-patterns) may be ordered by considering that most interesting patterns are unecpected patterns according to some random model. Here we consider the expected size of the core of a subgraph induced by a random vertex subset of size n drawn from the attributed graph. Given a pattern which occurs in n vertices, we may then consider to what extent the core size is larger than the expected value. Expected core sizes and related standard deviations are computed after the enumeration using a different program.

Computing structural communities associated to abstract closed patterns [Soldano2015]

The core subgraph associated to an abstract closed pattern may contain various structural communities. An obvious definition of a structural community is a connected component, but stronger definitions have been proposed by [Palla2007], namely k-communities.

A k-community is made of k-cliques and within the commuty it is always possible to find a path such that two adjacent k-cliques share a ( k-1)-clique.

We may then define a local closed pattern l as the largest itemset common to a structural community e extracted from the core subgraph associated to some pattern q, and enumerate the triples (l,e,q) we call local rules.

Such a rule expresses the specificity of the structural community e in the pattern q core subgraph:

We have implemented various structural communities definitions, including 3-communities.




Source files minerLC and minerLSD


This archive contains a Makefile and a directory with a french documentation that describes the formats of input and output files and explains how to compile and run the programs:


Purposes and Command line examples

Single pattern mining of undirected networks [Soldano2014cr], [Soldano2017fcasna]

Example of enumeration of 16-core abstract patterns of the DBLP_P dataset. Core vertices must have at least 16 neighbors in the core subgraph.


  bash $> minerLC DBLP_P.nri -d1 16 > DBLP_P.nri.d1_16

Example of computation of expected abstract supports of found patterns using a monte-carlo sampling with size 100 samplings (takes the outpfile generated supra). supportAttendus takes as input the file generated by minerLC with the same parameters.


  bash $> supportsAttendus DBLP_P.nri DBLP_P.nri.d1_16 -d1 16 -n 100 > DBLP_P.nri.d1_16_100

Single pattern mining of directed networks [Soldano:2017ab]

Example of enumeration of the 'h-a' HA-core abstract patterns in the LAWYERS dataset. The 'h-a' Hub-Authority core of a directed graph is made of a Hub vertex subset together with an Authority vertex subset. The associated core subgraph is such that all Hub vertices have outdegree at least 'h' towards Authority vertices while all Authority vertices have indegree at least 'a' from Hub vertices. Option -o is required to mine directed graphs.


  bash $> minerLC LAWYERS.nri -o -hub_aut1 4 4 > LAWYERS.nri.o_hub_aut1_4_4

Example of computation of expected abstract supports of found patterns using a monte-carlo simulation with size 100 sampling.


  bash $> supportsAttendus LAWYERS.nri LAWYERS.nri.o_hub_aut1_4_4 -hub_aut1 4 4 -n 100 > LAWYERS.nri.o_hub_aut1_4_4_100

Single pattern mining of undirected network with local modularity constraint [Atzmueller2018dn]

Example of enumeration of 10-core abstract patterns on the DBLP_C dataset. The -l 0.1 means that only patterns whose core subgraph has local modularity at least 0.1 are output.


  bash $> minerLC DBLP_C.nri -lm 0.1 -d1 10 > DBLP_C.nri.lm_0.1_d1_10

Bi-pattern mining of two-mode and directed networks [Soldano2018mn]

Example of enumeration of 6-6- BHA core abstract bi-patterns on the two-mode PtoC_BI dataset.

In a two-mode network there are two vertex sets V1 and V2 and the edges relate vertices in V1 to vertices in V2. The h-a BHA core is a pair (H,A) of vertex subsets where H is a subset of V1 while V2 is a subset of V2. The associated core subgraph is made of the edges relating H to A. All H vertices have degree at least 'h' while all A vertices have degree at least 'a'. A bi-pattern is then a pair (q1,q2) where q1 selects objects in V1 while q2 selects objects in V2.

The format of the input file allwong bi-pattern mining in minerLC is particular and described in the section nri files.


  minerLC PtoC.nri -o -hub_aut1 6 6 > PtoC.nri.o_hub_aut1_6_6




Dataset files

Downloadable files should be unziped before use with minerLC programs.

Description / Source Name link
Undirected [Michell1996he]
Social network from : The Teenage Friends and Lifestyle Study.
Original datastes from https://www.stats.ox.ac.uk/~snijders/siena/Glasgow_data.htm.
The dataset provided in the nri format derives from the data of the first year of the study. The graph comes from Glasgow-friendship.RData and the labels from Glasgow-substances.RData. Occasional and regular tobacco and drug consumption labels are undifferenciated, as well as alcohol cconsumption once a week and more than once a week.
S50 S50.nri.gz (0.7 Ko)
Undirected [Galbrun2014dm]
DBLP co-authorship of publications (DBLP_XL) and selection of the publications at the ICDM conferences (DBLP_C) and song preference of LastFM users.
Source code and original datasets from the author's site: http://research.ics.aalto.fi/dmg/lic.tgz
DBLP_C DBLP_C.nri (172 Ko)
LASTFM LASTFM.nri.gz (367 Ko)
DBLP_XL DBLP_XL.nri.gz (367 Ko)
Undirected [Prado2013tkde]
Orginal Co-authorship labelled graph has been enriched by 3 additional labels that collects interest in domains. Those labels were derived from the original ones.
Original dataset avialable from author on request at https://perso.liris.cnrs.fr/marc.plantevit/doku/doku.php?id=data_sets
DBLP_P DBLP_P.nri.gz (1.5 Mo)
Undirected [Silva2010kdd]
Co-authorship Labelled graphs.
Source code and original datasets from author's site : https://code.google.com/archive/p/scpm/
DBLP_S DBLP_S.nri.gz (5.6 Mo)
Undirected
Social networking, bookmarking, and tagging information from a set of 2K users from Delicious social bookmarking system.(http://www.delicious.com).
Original dataset are downloadable from https://grouplens.org/datasets/hetrec-2011/
DELICIOUS DELICIOUS.nri.gz (1.8 Ko)
Directed [Lazega1997]
Corporate law partnership advice network.
Original dataset downloadable from https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm
LAWYERS LAWYERS.nri.gz (1.8 Ko)
Unirected [Lazega1997]
Vairiante of the previous Corporate law partnership advice network.
LAWYERS_U
Two-Mode (bi-pattern mining).
Network relating particpants to campaigns of the deep sea BENTHOS Program.
PtoC_BI PtoC_BI.nri.gz
Unidrected (bi-pattern mining).
Directed (bi-pattern mining).
PtoC_BI.nri LAZEGA_BI.nri DBLP_E_BI.nri COEXP_BI.nri



CDF wolfram computable document (interactive)

Hereunder two CDF documents allowing to investigate interactively the 4-4 and 9-9 Hub Authority core closed patterns. In order to "play" the interactive documents you need to download and install the wolfram CDF player from the following URL:
https://www.wolfram.com/cdf-player/



References




Contacts

In case of difficulties please contact us at: