{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Implementing your own model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial we show how to implement your own model and test it on a dataset. \n", "\n", "This particular example uses the MUTAG dataset, uses an hypergraph lifting to create hypergraphs, and defines a model to work on them. \n", "\n", "We train the model using the appropriate training and validation datasets, and finally test it on the test dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Table of contents\n", " [1. Imports](##sec1)\n", "\n", " [2. Configurations and utilities](##sec2)\n", "\n", " [3. Loading the data](##sec3)\n", "\n", " [4. Backbone definition](##sec4)\n", "\n", " [5. Model initialization](##sec5)\n", "\n", " [6. Training](##sec6)\n", "\n", " [7. Testing the model](##sec7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Imports " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import lightning as pl\n", "import torch\n", "from omegaconf import OmegaConf\n", "\n", "from topobenchmarkx.data.loaders import GraphLoader\n", "from topobenchmarkx.data.preprocessor import PreProcessor\n", "from topobenchmarkx.dataloader import TBXDataloader\n", "from topobenchmarkx.evaluator import TBXEvaluator\n", "from topobenchmarkx.loss import TBXLoss\n", "from topobenchmarkx.model import TBXModel\n", "from topobenchmarkx.nn.encoders import AllCellFeatureEncoder\n", "from topobenchmarkx.nn.readouts import PropagateSignalDown\n", "from topobenchmarkx.optimizer import TBXOptimizer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Configurations and utilities " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Configurations can be specified using yaml files or directly specified in your code like in this example." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "loader_config = {\n", " \"data_domain\": \"graph\",\n", " \"data_type\": \"TUDataset\",\n", " \"data_name\": \"MUTAG\",\n", " \"data_dir\": \"./data/MUTAG/\",\n", "}\n", "\n", "transform_config = { \"khop_lifting\":\n", " {\"transform_type\": \"lifting\",\n", " \"transform_name\": \"HypergraphKHopLifting\",\n", " \"k_value\": 1,}\n", "}\n", "\n", "split_config = {\n", " \"learning_setting\": \"inductive\",\n", " \"split_type\": \"random\",\n", " \"data_seed\": 0,\n", " \"data_split_dir\": \"./data/MUTAG/splits/\",\n", " \"train_prop\": 0.5,\n", "}\n", "\n", "in_channels = 7\n", "out_channels = 2\n", "dim_hidden = 16\n", "\n", "readout_config = {\n", " \"readout_name\": \"PropagateSignalDown\",\n", " \"num_cell_dimensions\": 1,\n", " \"hidden_dim\": dim_hidden,\n", " \"out_channels\": out_channels,\n", " \"task_level\": \"graph\",\n", " \"pooling_type\": \"sum\",\n", "}\n", "\n", "loss_config = {\"task\": \"classification\", \"loss_type\": \"cross_entropy\"}\n", "\n", "evaluator_config = {\"task\": \"classification\",\n", " \"num_classes\": out_channels,\n", " \"metrics\": [\"accuracy\", \"precision\", \"recall\"]}\n", "\n", "optimizer_config = {\"optimizer_id\": \"Adam\",\n", " \"parameters\":\n", " {\"lr\": 0.001,\"weight_decay\": 0.0005}\n", " }\n", "\n", "loader_config = OmegaConf.create(loader_config)\n", "transform_config = OmegaConf.create(transform_config)\n", "split_config = OmegaConf.create(split_config)\n", "readout_config = OmegaConf.create(readout_config)\n", "loss_config = OmegaConf.create(loss_config)\n", "evaluator_config = OmegaConf.create(evaluator_config)\n", "optimizer_config = OmegaConf.create(optimizer_config)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Loading the data " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example we use the MUTAG dataset. It is a graph dataset and we use the k-hop lifting to transform the graphs into hypergraphs. \n", "\n", "We invite you to check out the README of the [repository](https://github.com/pyt-team/TopoBenchmarkX) to learn more about the various liftings offered." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Transform parameters are the same, using existing data_dir: ./data/MUTAG/MUTAG/khop_lifting/1116229528\n" ] } ], "source": [ "graph_loader = GraphLoader(loader_config)\n", "\n", "dataset, dataset_dir = graph_loader.load()\n", "\n", "preprocessor = PreProcessor(dataset, dataset_dir, transform_config)\n", "dataset_train, dataset_val, dataset_test = preprocessor.load_dataset_splits(split_config)\n", "datamodule = TBXDataloader(dataset_train, dataset_val, dataset_test, batch_size=32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Backbone definition " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To implement a new model we only need to define the forward method.\n", "\n", "With a hypergraph with $n$ nodes and $m$ hyperedges this model simply calculates the hyperedge features as $X_1 = B_1 \\cdot X_0$ where $B_1 \\in \\mathbb{R}^{n \\times m}$ is the incidence matrix, where $B_{ij}=1$ if node $i$ belongs to hyperedge $j$ and is 0 otherwise.\n", "\n", "Then the outputs are computed as $X^{'}_0=\\text{ReLU}(W_0 \\cdot X_0 + B_0)$ and $X^{'}_1=\\text{ReLU}(W_1 \\cdot X_1 + B_1)$, by simply using two linear layers with ReLU activation." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "class myModel(pl.LightningModule):\n", " def __init__(self, dim_hidden):\n", " super().__init__()\n", " self.dim_hidden = dim_hidden\n", " self.linear_0 = torch.nn.Linear(dim_hidden, dim_hidden)\n", " self.linear_1 = torch.nn.Linear(dim_hidden, dim_hidden)\n", "\n", " def forward(self, batch):\n", " x_0 = batch.x_0\n", " incidence_hyperedges = batch.incidence_hyperedges\n", " x_1 = torch.sparse.mm(incidence_hyperedges, x_0)\n", " \n", " x_0 = self.linear_0(x_0)\n", " x_0 = torch.relu(x_0)\n", " x_1 = self.linear_1(x_1)\n", " x_1 = torch.relu(x_1)\n", " \n", " model_out = {\"labels\": batch.y, \"batch_0\": batch.batch_0}\n", " model_out[\"x_0\"] = x_0\n", " model_out[\"hyperedge\"] = x_1\n", " return model_out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Model initialization " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that the model is defined we can create the TBXModel, which takes care of implementing everything else that is needed to train the model. \n", "\n", "First we need to implement a few classes to specify the behaviour of the model." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "backbone = myModel(dim_hidden)\n", "\n", "readout = PropagateSignalDown(**readout_config)\n", "loss = TBXLoss(**loss_config)\n", "feature_encoder = AllCellFeatureEncoder(in_channels=[in_channels], out_channels=dim_hidden)\n", "\n", "evaluator = TBXEvaluator(**evaluator_config)\n", "optimizer = TBXOptimizer(**optimizer_config)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can instantiate the TBXModel." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "model = TBXModel(backbone=backbone,\n", " backbone_wrapper=None,\n", " readout=readout,\n", " loss=loss,\n", " feature_encoder=feature_encoder,\n", " evaluator=evaluator,\n", " optimizer=optimizer,\n", " compile=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Training " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use the `lightning` trainer to train the model." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "GPU available: True (mps), used: False\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n", "/opt/miniconda3/envs/topox/lib/python3.11/site-packages/lightning/pytorch/trainer/setup.py:187: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.\n", "/opt/miniconda3/envs/topox/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default\n", "\n", " | Name | Type | Params\n", "----------------------------------------------------------\n", "0 | feature_encoder | AllCellFeatureEncoder | 448 \n", "1 | backbone | myModel | 544 \n", "2 | readout | PropagateSignalDown | 34 \n", "3 | val_acc_best | MeanMetric | 0 \n", "----------------------------------------------------------\n", "1.0 K Trainable params\n", "0 Non-trainable params\n", "1.0 K Total params\n", "0.004 Total estimated model params size (MB)\n", "/opt/miniconda3/envs/topox/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.\n", "/opt/miniconda3/envs/topox/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The ``compute`` method of metric MulticlassAccuracy was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.\n", " warnings.warn(*args, **kwargs) # noqa: B028\n", "/opt/miniconda3/envs/topox/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The ``compute`` method of metric MulticlassPrecision was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.\n", " warnings.warn(*args, **kwargs) # noqa: B028\n", "/opt/miniconda3/envs/topox/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The ``compute`` method of metric MulticlassRecall was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.\n", " warnings.warn(*args, **kwargs) # noqa: B028\n", "/opt/miniconda3/envs/topox/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.\n", "`Trainer.fit` stopped: `max_epochs=50` reached.\n" ] } ], "source": [ "# Increase the number of epochs to get better results\n", "trainer = pl.Trainer(max_epochs=50, accelerator=\"cpu\", enable_progress_bar=False, log_every_n_steps=1)\n", "\n", "trainer.fit(model, datamodule)\n", "train_metrics = trainer.callback_metrics" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Training metrics\n", " --------------------------\n", "train/accuracy: 0.7234\n", "train/precision: 0.7849\n", "train/recall: 0.5888\n", "val/loss: 0.5416\n", "val/accuracy: 0.7234\n", "val/precision: 0.7355\n", "val/recall: 0.5844\n", "train/loss: 0.4863\n" ] } ], "source": [ "print(' Training metrics\\n', '-'*26)\n", "for key in train_metrics:\n", " print('{:<21s} {:>5.4f}'.format(key+':', train_metrics[key].item()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Testing the model " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can test the model and obtain the results." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/miniconda3/envs/topox/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.\n" ] }, { "data": { "text/html": [ "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
       "┃        Test metric               DataLoader 0        ┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
       "│       test/accuracy           0.6808510422706604     │\n",
       "│         test/loss              0.532489538192749     │\n",
       "│      test/precision           0.8333333730697632     │\n",
       "│        test/recall            0.5588235259056091     │\n",
       "└───────────────────────────┴───────────────────────────┘\n",
       "
\n" ], "text/plain": [ "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n", "┃\u001b[1m \u001b[0m\u001b[1m Test metric \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m DataLoader 0 \u001b[0m\u001b[1m \u001b[0m┃\n", "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n", "│\u001b[36m \u001b[0m\u001b[36m test/accuracy \u001b[0m\u001b[36m \u001b[0m│\u001b[35m \u001b[0m\u001b[35m 0.6808510422706604 \u001b[0m\u001b[35m \u001b[0m│\n", "│\u001b[36m \u001b[0m\u001b[36m test/loss \u001b[0m\u001b[36m \u001b[0m│\u001b[35m \u001b[0m\u001b[35m 0.532489538192749 \u001b[0m\u001b[35m \u001b[0m│\n", "│\u001b[36m \u001b[0m\u001b[36m test/precision \u001b[0m\u001b[36m \u001b[0m│\u001b[35m \u001b[0m\u001b[35m 0.8333333730697632 \u001b[0m\u001b[35m \u001b[0m│\n", "│\u001b[36m \u001b[0m\u001b[36m test/recall \u001b[0m\u001b[36m \u001b[0m│\u001b[35m \u001b[0m\u001b[35m 0.5588235259056091 \u001b[0m\u001b[35m \u001b[0m│\n", "└───────────────────────────┴───────────────────────────┘\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "trainer.test(model, datamodule)\n", "test_metrics = trainer.callback_metrics" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Testing metrics\n", " -------------------------\n", "test/loss: 0.5325\n", "test/accuracy: 0.6809\n", "test/precision: 0.8333\n", "test/recall: 0.5588\n" ] } ], "source": [ "print(' Testing metrics\\n', '-'*25)\n", "for key in test_metrics:\n", " print('{:<20s} {:>5.4f}'.format(key+':', test_metrics[key].item()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }