ICML Challenge 2024 ------------------- Welcome to the Topological Deep Learning Challenge 2024: *Beyond the Graph Domain*, jointly organized by `TAG-DS `__ & PyT-Team and hosted by the `Geometry-grounded Representation Learning and Generative Modeling (GRaM) Workshop `__ at ICML 2024. .. seealso:: Link to the challenge repository: ``__. *Organizers, reviewers, and contributors:* Guillermo Bernárdez, Lev Telyatnikov, Marco Montagna, Federica Baccini, Nina Miolane, Mathilde Papillon, Miquel Ferriol-Galmés, Mustafa Hajij, Theodore Papamarkou, Johan Mathe, Audun Myers, Scott Mahan, Olga Zaghen, Maria Sofia Bucarelli, Sharvaree Vadgama, Erik Bekkers, Tim Doster, Tegan Emerson, Henry Kvinge. WINNERS -------- 🏆 1st CATEGORY ~~~~~~~~~~~~~~~~~~ 🥇 1st-place, PR 63: Random Latent Clique Lifting (graph to simplicial); Mauricio Tec, Claudio Battiloro, George Dasoulas 🥈 2nd-place, PR 58: Hypergraph Heat Kernel Lift (Hypergraph to Simplicial); Matt Piekenbrock 🥉 3rd-place, PR 11: DnD Lifting (Graph to Simplicial Complex); Jonas Verhellen 🏆 2nd CATEGORY ~~~~~~~~~~~~~~~~~~ 🥇 1st-place, PR 57: Simplicial Paths Lifting (graph to COMBINATORIAL); Manuel Lecha, Andrea Cavallo, Claudio Battiloro 🥈 2nd-place, PR 32: Matroid lifting (Graph to Combinatorial); Giordan Escalona 🥉 3rd-place, PR 33: Forman-Ricci Curvature Coarse Geometry Lifting (Graph to Hypergraph); Michael Banf, Dominik Filipiak, Max Schattauer, Liliya Imasheva 🏆 3rd CATEGORY ~~~~~~~~~~~~~~~~~~ 🥇 1st-place, PR 53: PointNet++ lifting (pointcloud to hypergraph); Julian Suk, Patryk Rygiel 🥈 2nd-place, PR 30: Kernel Lifting (Graph to Hypergraph); Alexander Nikitin 🥉 3rd-place, PR 45: Mixture of Gaussians + MST lifting (pointcloud to hypergraph); Sebastian Mežnar, Boshko Koloski, Blaž Škrlj 🏆 4th CATEGORY ~~~~~~~~~~~~~~~~~~ 🥇 1st-place, PR 32: Matroid lifting (Graph to Combinatorial); Giordan Escalona 🥈 2nd-place, PR 33: Forman-Ricci Curvature Coarse Geometry Lifting (Graph to Hypergraph); Michael Banf, Dominik Filipiak, Max Schattauer, Liliya Imasheva 🥉 3rd-place, PR 58: Hypergraph Heat Kernel Lift (Hypergraph to Simplicial); Matt Piekenbrock 🏆HONORABLE MENTIONS ~~~~~~~~~~~~~~~~~~~ - ⭐ **Great Contributors** ⭐ - Martin Carrasco (PRs 28, 29, 41, 50) - Bertran Miquel-Oliver, Manel Gil-Sorribes, Alexis Molina, Victor Guallar (PRs 14, 16, 21, 37, 42) - Theodore Long (PRs 22, 35, 65) - Jonas Verhellen (PRs 5, 7, 8, 10, 11) - Pavel Snopov (PRs 6, 9, 18, 20) - Julian Suk, Patryk Rygiel (PRs 23, 34, 53) - 🎖️ **Highlighted Submissions** 🎖️ - PR 49: Modularity Maximization Lifting (Graph to Hypergraph), Valentina Sánchez - PR 47: Universal Strict Lifting (Hypergraph to Combinatorial), Álvaro Martinez - PR 48: Mapper Lifting (Graph to Hypergraph), Halley Fritze, Marissa Masden Motivation ---------- The field of Topological Deep Learning (TDL) aims to extend Graph Neural Networks (GNN) by naturally processing relations between more than two elements, ubiquitous in any realistic complex system (social connections within a community, molecular structures and reactions, n-body interactions,...). More specifically, TDL methods allow to go beyond graphs’ pairwise connections –and local neighborhoods– by encoding higher-order relationships through algebraic topology concepts; :numref:`fig:domains` illustrates the standard *topological domains* used to that end. .. figure:: ./figures/domain_categories_with_relations.png :name: fig:domains Domains of Topological Deep Learning. Figure adopted from `Papillon et al., 2023. `__ (work highly recommended for those interested in knowing more details about these domains). Despite its recent emergence, TDL is already postulated to become a relevant tool in many research areas and applications, from complex physical systems and signal processing to molecular analysis or social interactions, to name a few. However, a current limiting factor is that most existing datasets are presently stored as point clouds or graphs, i.e. the traditional discrete domains (:numref:`fig:domains`). While researchers have introduced various mechanisms for extracting higher-order elements, it remains unclear how to optimize the process given a specific dataset and task. The main purpose of this challenge is precisely to foster new research and knowledge about effective mappings between different topological domains and data structures, helping to expand the current scope and impact of TDL to a much broader range of contexts and scenarios. **Remark:** This process of mapping a data structure to different topological domains is called "topological lifting", or just "lifting" to abbreviate; :numref:`fig:lifting` shows some visual examples. The "topological lifting" transfers data from the original domain where the signal (node/edge features) exists to the new domain where new objects can exist, such as simplicial/cell complexes. Therefore, it’s crucial to also derive and provide descriptors for these introduced objects, and this process is known as "feature lifting". .. figure:: figures/lifting_maps.png :name: fig:lifting Examples of liftings: (a) A graph is lifted to a hypergraph by adding hyperedges that connect groups of nodes. (b) A graph can be lifted to a cellular complex by adding faces of any shape. (c) Hyperedges can be added to a cellular complex to lift the structure to a combinatorial complex. Figure adopted from `Hajij et al. 2023. `__ Description of the Challenge ---------------------------- We propose that participants design and implement lifting mappings between different data structures and topological domains (point-clouds, graphs, hypergraphs, simplicial/cell/combinatorial complexes), to bridge the gap the gap between TDL and all kinds of existing datasets. In particular, participants can either implement already proposed liftings from the literature (see Related References section below), or design original approaches; both options are equally allowed. In the case of submissions with novel liftings, we emphasize that participants will keep all the credit for their implementations, and neither the challenge nor its related reward outcomes will prevent them from publishing their independent works. Moreover, aligned with the primary goal of broadening the footprint and usage of TDL, the submission of liftings from point-clouds/graphs to higher-order topological domains is encouraged. However, this is not a requirement: the challenge also welcomes transformations between any other pair of topological structures (e.g., from hypergraph to simplicial domain). In order to ensure consistency and compositionality, implementations need to be compatible with the ``BaseTransform`` class of ``torch_geometric``, and should leverage NetworkX/TopoNetX/ TopoEmbedX libraries when dealing with graph/higher-order datasets. Each submission takes the form of a Pull Request to ``challenge-icml-2024`` repo containing the necessary code for implementing a lifting map. More details are provided in subsequent sections below. **Note:** We invite participants to review this webpage regularly, as more details might be added to answer relevant questions and doubts raised to the organizers. Reward Outcomes [1]_ -------------------- ⭐️ Every submission respecting the submission requirements will be included in a white paper summarizing the findings of the challenge, published in PMLR through the `GRaM Workshop `__ at ICML 2024. All participants with qualifying submissions will have the opportunity to co-author this publication. 📘 Winning participants will also have the opportunity to co-author a paper with an in-depth study on lifting procedures, focusing on assessing different transformations across topological domains. This work will be submitted to the Journal of Data-centric Machine Learning Research (DMLR). 🏆 Winner submissions will receive special recognition at ICML 2024 `GRaM Workshop `__, where the Award Ceremony will take place. Deadline -------- The final submission deadline is **July 12th, 2024 (AoE)**. Participants are welcome to modify their Pull Request until this time. Guidelines ---------- Everyone can participate and participation is free –only principal PyT-Team developers are excluded. It is sufficient to: - Send a valid Pull Request (i.e. passing all tests) before the deadline. - Respect Submission Requirements (see below). Teams are accepted, and there is no restriction on the number of team members. An acceptable Pull Request automatically subscribes a participant/team to the challenge. We encourage participants to start submitting their Pull Request early on, as this helps addressing potential issues with the code. Moreover, earlier Pull Requests will be given priority consideration in the case of multiple submissions of similar quality implementing the same lifting. A Pull Request should contain no more than one lifting. However, there is no restriction on the number of submissions (Pull Requests) per participant/team. Submission Requirements ----------------------- The submission must implement a valid lifting transformation between any pair of the following data structures: point-cloud/graph, hypergraph, simplicial complex, cell complex, and combinatorial complex. For a lifting to be valid, participants must implement a mapping between the topological structures of the considered domains –*topology lifting*. Participants may optionally implement a procedure to define the features over the resulting topology –*feature lifting*. All submitted code must comply with the challenge’s GitHub Action workflow, successfully passing all tests, linting, and formatting (i.e., ruff). Moreover, to ensure consistency, we ask participants to use TopoNetX’s classes to manage simplicial/cell/combinatorial complexes whenever these topological domains are the target –i.e., destination– of the lifting. **Remark:** We highly encourage the use of TopoNetX, TopoEmbedX and NetworkX libraries. Topology Lifting (Required) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Submissions can implement already proposed liftings from the literature, as well as novel approaches. In the case of original liftings, we note that neither the challenge nor its related publications will prevent participants from publishing their own work: they will keep all the credit for their implementations. For a lifting from a certain source domain ``src`` (e.g. graph) to a topological destination ``dst`` (e.g. simplicial), a submission consists of a Pull Request to the ICML Challenge repository that contains the following files: #. ``{id lifting}_lifting.py`` (e.g. ``clique_lifting.py``) - Stored in the directory ``modules/transforms/liftings/{src}2{dst}/`` - | Defines a class ``{Id lifting}Lifting`` that implements a ``lift_topology()`` method that performs the specific ``{src}2{dst}`` topological lifting considered (e.g. | ``SimplicialCliqueLifting`` as a ``graph2simplicial`` transform). It may also implement other auxiliary functions, and can override parent methods if required. - | This class must inherit from ``{Src}2{Dst}Lifting`` abstract class (e.g. | ``Graph2SimplicialLifting``), which we provide for every pair {``src``,\ ``dst``} within the corresponding directory. When justified, this and other abstract parent classes can be modified. - The implemented lifting –and in general, any implemented data/feature transformation– must be added to ``TRANSFORMS`` dictionary in ``data_transform.py`` file, located at ``modules/transforms/`` directory. The keys of ‘TRANSFORMS’ dictionary correspond to ‘transform_name’ field in corresponding .yaml files while the values refers to corresponding class that implements the logic of the transform. | **Note:** We provide several lifting examples for ``graph2simplicial``, ``graph2cell`` and | ``graph2hypergraph``. #. ``{id lifting}_lifting.yaml`` (e.g. ``clique_lifting.yaml``) - Stored in the directory ``configs/transforms/liftings/{src}2{dst}/`` - Defines the default parameters of the implemented transform. **Note:** You can find config examples for all our implemented liftings and data transforms. #. ``{id lifting}_lifting.ipynb`` (e.g. ``clique_lifting.py``) - Stored in the directory ``tutorials/{src}2{dst}/`` - Contains the following steps: #. Dataset Loading - Implements the pipeline to load a dataset from the ``src`` domain. Since the challenge repository doesn’t allow storing large files, loaders must download datasets from external sources into the ``datasets/`` folder. - This pipeline is provided for several graph-based datasets. For any other ``src`` domain, participants are allowed to transform graph datasets into the corresponding domain through our provided lifting mappings –or just dropping their connectivity to get point-clouds. - *(Bonus)* Designing a loader for a new dataset (ones that are not already provided in the tutorials) will be positively taken into consideration in the final evaluation. #. Pre-processing the Dataset - Applies the lifting transform to the dataset. - | Needs to be done through the ``PreProcessor``, which we provide in | ``modules/io/preprocess/preprocessor.py``. #. Running a Model over the Lifted Dataset - Creates a Neural Network model that operates over the ``dst`` domain, leveraging TopoModelX for higher order topologies or torch_geometric for graphs. - Runs the model on the lifted dataset. **Note:** Several examples are provided in ``tutorials/``. #. ``test_{id lifting}.py`` (e.g. ``test_cycle_lifting.py``) - Stored in the directory ``tests/transforms/liftings/{src}2{dst}/`` - Contains one class, ``Test{Id lifting}``, which contains unit tests for all of the methods contained in the ``{Id lifting}Lifting`` class. - Please use pytest (not unittest). **Note:** We provide several examples in the corresponding directories. Feature Lifting (Optional) ~~~~~~~~~~~~~~~~~~~~~~~~~~ Some TDL models require well-defined features on higher-order structures (e.g. 2-cells, or hyperedges); therefore, in their more general formulation, liftings also need to produce initial features for every topological element of the ``dst`` domain. In particular, in all our examples we make use of a straightforward ``SumProjection`` transform to that end, which gets the desired structural features by sequentially projecting the original signals via incidence matrices. Participants are more than welcome to implement new feature liftings mappings, which can be added to the ``feature_liftings.py`` file at the ``modules/transforms/feature_liftings/`` directory. However, we remark this is optional, and it will only be regarded as a bonus. **Note:** Please, reach out if you want to know more details about how to implement a new feature lifting and/or a novel data loader. We also provide some data manipulations transforms that could be useful when defining more complex data pipelines. Evaluation ---------- Award Categories ~~~~~~~~~~~~~~~~ Given the lack of an exhaustive analysis about different types of procedures to infer the topological structure within TDL, there is not any particular requirement for submitted liftings –apart from a high-quality code implementation. To promote and guide diversity in submissions, we propose the following general, non-mutually exclusive award categories: - Best implementation of a existing lifting from the literature. - Best novel lifting design that only leverages the relational information of the source domain (i.e. connectivity-based lifting). - Best novel lifting design that leverages the original features of the source domain to infer the target topology (i.e. feature-based lifting). If available, connectivity can also be simultaneously used. - Best implementation of a deterministic lifting (existing or novel). - Best implementation of a non-deterministic lifting (existing or novel). We encourage participants to tag and categorize their Pull Requests with these and other possible taxonomies. In fact, we might reconsider some categories based on participants feedback and submissions. Additionally, we reserve the right to award some honorable mentions considering some aspects like originality, theoretical robustness, loading interesting datasets, implementing new feature liftings, etc. Evaluation Procedure ~~~~~~~~~~~~~~~~~~~~ The Condorcet method will be used to rank the submissions and decide on the winners in each category. The evaluation criteria will be: - Does the submission implement the lifting correctly? Is it reasonable and well-defined? - How readable/clean is the implementation? How well does the submission respect the submission requirements? - Is the submission well-written? Do the docstrings clearly explain the methods? Are the unit tests robust? Note that these criteria do not reward final model performance, nor the complexity of the method. Rather, the goal is to implement well-written and accurate liftings that will unlock further experimental evidence and insights in this field. Selected PyT-Team maintainers and collaborators, as well as each team whose submission(s) respect(s) the guidelines, will vote once on Google Form to express their preference for the best submission in each category. Note that each team gets only one vote/domain, even if there are several participants in the team. A link to a Google Form will be provided to record the votes. While the form will ask for an email address to identify the voter, voters’ identities will remain secret–only the final ranking will be shared. Questions --------- Feel free to contact us through GitHub issues on this repository, or through the `Geometry and Topology in Machine Learning slack `__. Alternatively, you can contact us via mail at any of these accounts: guillermo.bernardez@upc.edu, lev.telyatnikov@uniroma1.it. .. _reference_list: Related References ------------------ As a support to participants, in this section we share some related references that propose topological liftings or might help defining novel ones. #. Papillon, M., Sanborn, S., Hajij, M., & Miolane, N. (2023). Architectures of Topological Deep Learning: A Survey on Topological Neural Networks. *arXiv preprint arXiv:2304.10031*. #. Hajij, M., Zamzmi, G., Papamarkou, T., Miolane, N., Guzmán-Sáenz, A., Ramamurthy, K. N., et al. (2022). Topological deep learning: Going beyond graph data. *arXiv preprint arXiv:2206.00606*. #. Baccini, F., Geraci, F., & Bianconi, G. (2022). Weighted simplicial complexes and their representation power of higher-order network data and topology. *Physical Review E, 106*\ (3), 034319. #. Barbarossa, S., & Sardellitti, S. (2020). Topological signal processing over simplicial complexes. *IEEE Transactions on Signal Processing, 68*, 2992–3007. #. Battiloro, C., Spinelli, I., Telyatnikov, L., Bronstein, M., Scardapane, S., & Di Lorenzo, P. (2023). From latent graph to latent topology inference: differentiable cell complex module. *arXiv preprint arXiv:2305.16174*. #. Benson, A. R., Gleich, D. F., & Higham, D. J. (2021). Higher-order network analysis takes off, fueled by classical ideas and new data. *arXiv preprint arXiv:2103.05031*. #. Bodnar, C., Frasca, F., Otter, N., Wang, Y., Lio, P., Montufar, G. F., & Bronstein, M. (2021). Weisfeiler and Lehman go cellular: Cw networks. In *Advances in neural information processing systems* (Vol. 34, pp. 2625–2640). #. Bodnar, C., Frasca, F., Wang, Y., Otter, N., Montufar, G. F., Lio, P., & Bronstein, M. (2021, July). Weisfeiler and Lehman go topological: Message passing simplicial networks. In *International Conference on Machine Learning* (pp. 1026-1037). PMLR. #. Elshakhs, Y. S., Deliparaschos, K. M., Charalambous, T., Oliva, G., & Zolotas, A. (2024). A Comprehensive Survey on Delaunay Triangulation: Applications, Algorithms, and Implementations Over CPUs, GPUs, and FPGAs. *IEEE Access*. #. Ferri, M., Bergomi, D. M. G., & Zu, L. (2018). Simplicial complexes from graphs towards graph persistence. *arXiv preprint arXiv:1805.10716*. #. Gao, Y., Zhang, Z., Lin, H., Zhao, X., Du, S., & Zou, C. (2020). Hypergraph learning: Methods and practices. *IEEE Transactions on Pattern Analysis and Machine Intelligence, 44*\ (5), 2548–2566. #. Hajij, M., & Istvan, K. (2021). Topological deep learning: Classification neural networks. *arXiv preprint arXiv:2102.08354*. #. Hajij, M., Zamzmi, G., Papamarkou, T., Miolane, N., Guzmán-Sáenz, A., & Ramamurthy, K. N. (2022). Higher-order attention networks. *arXiv preprint arXiv:2206.00606, 2*\ (3), 4. #. Hajij, M., Zamzmi, G., Papamarkou, T., Guzman-Saenz, A., Birdal, T., & Schaub, M. T. (2023). Combinatorial complexes: bridging the gap between cell complexes and hypergraphs. In *2023 57th Asilomar Conference on Signals, Systems, and Computers* (pp. 799–803). IEEE. #. Hoppe, J., & Schaub, M. T. (2024). Representing Edge Flows on Graphs via Sparse Cell Complexes. In *Learning on Graphs Conference* (pp. 1-1). PMLR. #. Jogl, F., Thiessen, M., & Gärtner, T. (2022). Reducing learning on cell complexes to graphs. In *ICLR 2022 Workshop on Geometrical and Topological Representation Learning*. #. Kahle, M. (2007). The neighborhood complex of a random graph. *Journal of Combinatorial Theory, Series A, 114*\ (2), 380–387. #. Kahle, M., & others. (2014). Topology of random simplicial complexes: a survey. *AMS Contemp. Math, 620*, 201–222. #. Kahle, M. (2016). Random simplicial complexes. *arXiv preprint arXiv:1607.07069*. #. Liu, X., & Zhao, C. (2023). Eigenvector centrality in simplicial complexes of hypergraphs. *Chaos: An Interdisciplinary Journal of Nonlinear Science, 33*\ (9). #. Lucas, M., Gallo, L., Ghavasieh, A., Battiston, F., & De Domenico, M. (2024). Functional reducibility of higher-order networks. *arXiv preprint arXiv:2404.08547*. #. Ruggeri, N., Battiston, F., & De Bacco, C. (2024). Framework to generate hypergraphs with community structure. *Physical Review E, 109*\ (3), 034309. .. [1] By law, US researchers are not allowed to co-author papers with scholars from some countries and institutions. Participants are responsible for checking eligibility.