🌐 TopoBenchmarkX (TBX) 🍩#

topobenchmarkx

TopoBenchmarkX (TBX) is a modular Python library designed to standardize benchmarking and accelerate research in Topological Deep Learning (TDL). In particular, TBX allows to train and compare the performances of all sorts of Topological Neural Networks (TNNs) across the different topological domains, where by topological domain we refer to a graph, a simplicial complex, a cellular complex, or a hypergraph.

workflow

📌 Overview#

The main pipeline trains and evaluates a wide range of state-of-the-art TNNs and Graph Neural Networks (GNNs) (see Neural Networks) on numerous and varied datasets and benchmark tasks (see Datasets).

Additionally, the library offers the ability to transform, i.e., lift, each dataset from one topological domain to another (see Liftings), enabling for the first time an exhaustive inter-domain comparison of TNNs.

⚙ Neural Networks#

We list the neural networks trained and evaluated by TopoBenchmarkX, organized by the topological domain over which they operate: graph, simplicial complex, cellular complex or hypergraph. Many of these neural networks were originally implemented in TopoModelX.

Graphs#

Simplicial complexes#

Cellular complexes#

Hypergraphs#

🚀 Liftings#

We list the liftings used in TopoBenchmarkX to transform datasets. Here, a lifting refers to a function that transforms a dataset defined on a topological domain (e.g., on a graph) into the same dataset but supported on a different topological domain (e.g., on a simplicial complex).

Graph2Simplicial#

Name

Description

Reference

CliqueLifting

The algorithm finds the cliques in the graph and creates simplices. Given a clique the first simplex added is the one containing all the nodes of the clique, then the simplices composed of all the possible combinations with one node missing, then two nodes missing, and so on, until all the possible pairs are added. Then the method moves to the next clique.

Simplicial Complexes

KHopLifting

For each node in the graph, take the set of its neighbors, up to k distance, and the node itself. These sets are then treated as simplices. The dimension of each simplex depends on the degree of the nodes. For example, a node with d neighbors forms a d-simplex.

Neighborhood Complexes

Graph2Cell#

Name

Description

Reference

CellCycleLifting

To lift a graph to a cell complex (CC) we proceed as follows. First, we identify a finite set of cycles (closed loops) within the graph. Second, each identified cycle in the graph is associated to a 2-cell, such that the boundary of the 2-cell is the cycle. The nodes and edges of the cell complex are inherited from the graph.

Appendix B

Graph2Hypergraph#

Name

Description

Reference

KHopLifting

For each node in the graph, the algorithm finds the set of nodes that are at most k connections away from the initial node. This set is then used to create a hyperedge. The process is repeated for all nodes in the graph.

Section 3.4

KNearestNeighborsLifting

For each node in the graph, the method finds the k nearest nodes by using the Euclidean distance between the vectors of features. The set of k nodes found is considered as a hyperedge. The process is repeated for all nodes in the graph.

Section 3.1

📚 Datasets#

Dataset

Task

Description

Reference

Cora

Classification

Cocitation dataset.

Source

Citeseer

Classification

Cocitation dataset.

Source

Pubmed

Classification

Cocitation dataset.

Source

MUTAG

Classification

Graph-level classification.

Source

PROTEINS

Classification

Graph-level classification.

Source

NCI1

Classification

Graph-level classification.

Source

NCI109

Classification

Graph-level classification.

Source

IMDB-BIN

Classification

Graph-level classification.

Source

IMDB-MUL

Classification

Graph-level classification.

Source

REDDIT

Classification

Graph-level classification.

Source

Amazon

Classification

Heterophilic dataset.

Source

Minesweeper

Classification

Heterophilic dataset.

Source

Empire

Classification

Heterophilic dataset.

Source

Tolokers

Classification

Heterophilic dataset.

Source

US-county-demos

Regression

In turn each node attribute is used as the target label.

Source

ZINC

Regression

Graph-level regression.

Source

🔍 References#

To learn more about TopoBenchmarkX, we invite you to read the paper:

@misc{topobenchmarkx2024,
        title={TopoBenchmarkX},
        author={PyT-Team},
        year={2024},
        eprint={TBD},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
}

If you find TopoBenchmarkX useful, we would appreciate if you cite us!

🦾 Getting Started#

Check out our tutorials to get started!