Initial commit of FSG-ViT

Files changed (13) hide show

.idea/.gitignore +3 -0
.idea/ViT_with_FSG.iml +12 -0
.idea/inspectionProfiles/Project_Default.xml +6 -0
.idea/inspectionProfiles/profiles_settings.xml +6 -0
.idea/misc.xml +4 -0
.idea/modules.xml +8 -0
.idea/vcs.xml +6 -0
README.md +144 -0
demo_inference_imnet.py +124 -0
demo_inference_mnist.py +108 -0
demo_training_imnet.py +114 -0
demo_training_mnist.py +106 -0
vit_with_fsg.py +109 -0

.idea/.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@

+# Default ignored files
+/shelf/
+/workspace.xml

.idea/ViT_with_FSG.iml ADDED Viewed

	@@ -0,0 +1,12 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<module type="PYTHON_MODULE" version="4">
+  <component name="NewModuleRootManager">
+    <content url="file://$MODULE_DIR$" />
+    <orderEntry type="jdk" jdkName="Python 3.10 (cvpr)" jdkType="Python SDK" />
+    <orderEntry type="sourceFolder" forTests="false" />
+  </component>
+  <component name="PyDocumentationSettings">
+    <option name="format" value="PLAIN" />
+    <option name="myDocStringFormat" value="Plain" />
+  </component>
+</module>

.idea/inspectionProfiles/Project_Default.xml ADDED Viewed

	@@ -0,0 +1,6 @@

+<component name="InspectionProjectProfileManager">
+  <profile version="1.0">
+    <option name="myName" value="Project Default" />
+    <inspection_tool class="PyPackageRequirementsInspection" enabled="false" level="WARNING" enabled_by_default="false" />
+  </profile>
+</component>

.idea/inspectionProfiles/profiles_settings.xml ADDED Viewed

	@@ -0,0 +1,6 @@

+<component name="InspectionProjectProfileManager">
+  <settings>
+    <option name="USE_PROJECT_PROFILE" value="false" />
+    <version value="1.0" />
+  </settings>
+</component>

.idea/misc.xml ADDED Viewed

	@@ -0,0 +1,4 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="ProjectRootManager" version="2" project-jdk-name="Python 3.10 (cvpr)" project-jdk-type="Python SDK" />
+</project>

.idea/modules.xml ADDED Viewed

	@@ -0,0 +1,8 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="ProjectModuleManager">
+    <modules>
+      <module fileurl="file://$PROJECT_DIR$/.idea/ViT_with_FSG.iml" filepath="$PROJECT_DIR$/.idea/ViT_with_FSG.iml" />
+    </modules>
+  </component>
+</project>

.idea/vcs.xml ADDED Viewed

	@@ -0,0 +1,6 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="VcsDirectoryMappings">
+    <mapping directory="$PROJECT_DIR$" vcs="Git" />
+  </component>
+</project>

README.md ADDED Viewed

	@@ -0,0 +1,144 @@

+# 🔬 Feature Selection Gates (FSG) for Vision Transformers (ViT)
+This repository provides a modular, extensible PyTorch implementation of **Feature Selection Gates (FSG)** with **Gradient Routing (GR)**, integrated into **Vision Transformers (ViTs)**. The approach is proposed in:
+> **Feature Selection Gates with Gradient Routing for Endoscopic Image Computing**
+> Giorgio Roffo, Carlo Biffi, Pietro Salvagnini, Andrea Cherubini
+> Presented at MICCAI 2024
+> 📄 [Paper](https://papers.miccai.org/miccai-2024/316-Paper0410.html) | 🧠 [arXiv](https://arxiv.org/abs/2407.04400) | 💻 [Code](https://github.com/cosmoimd/feature-selection-gates)
+---
+## 📌 What Is FSG?
+**FSG** introduces **learnable gates** that sparsify transformer blocks by modulating residual connections, acting as **online feature selectors**. This process encourages **sparse connectivity**, which reduces overfitting and increases generalization — especially valuable in small and imbalanced datasets.
+**Gradient Routing (GR)** enables dual-phase optimization:
+- One optimizer updates FSG parameters
+- A second optimizer updates the base model
+This separation allows **task-specific tuning** and ensures stable learning.
+---
+## 💡 Why Use FSG?
+✅ **Plug & play**: Can be integrated into **any ViT architecture**
+✅ Works on **natural images**, **medical images**, and beyond
+✅ Can be adapted to **NLP Transformers** like GPTs and BERT
+✅ Lightweight and highly regularizing
+✅ Compatible with **multi-stream CNNs** and hybrid models
+⚠️ While our focus is on **endoscopic image computing**, the method has shown performance improvements on **CIFAR-100**, proving its applicability to **standard vision tasks**.
+---
+## 🧪 How to Use the FSG Wrapper
+Use the `vit_with_fsg.py` script to augment a pretrained ViT from `torchvision`.
+```python
+from torchvision.models import vit_b_16, ViT_B_16_Weights
+from vit_with_fsg import vit_with_fsg
+import torch
+print("📥 Loading pretrained ViT_B_16...")
+backbone = vit_b_16(weights=ViT_B_16_Weights.DEFAULT)
+print("🔧 Wrapping with Feature Selection Gates (FSG)...")
+model = vit_with_fsg(vit_backbone=backbone)
+print("🧪 Running dummy input...")
+dummy_input = torch.randn(1, 3, 224, 224)
+output = model(dummy_input)
+print("✅ Done. Output shape:", output.shape)
+```
+---
+## 🚀 Demo Scripts
+We provide full working training and inference examples:
+| Dataset     | Training Script            | Inference Script            | Checkpoint Path                             |
+|-------------|-----------------------------|------------------------------|----------------------------------------------|
+| MNIST       | `demo_training_mnist.py`    | `demo_inference_mnist.py`    | `./checkpoints/fsg_vit_mnist_demo.pth`       |
+| Imagenette  | `demo_training_imnet.py`    | `demo_inference_imnet.py`    | `./checkpoints/fsg_vit_imagenette_demo.pth`  |
+Each demo:
+- Trains a ViT+B16 with FSG on a reduced dataset for speed.
+- Uses separate learning rates for FSG and base model parameters.
+- Includes GPU-aware prints and a training progress bar.
+- Saves checkpoints for reproducible inference.
+### ▶️ Example Usage
+```bash
+# Train on Imagenette
+python demo_training_imnet.py
+# Inference on Imagenette
+python demo_inference_imnet.py --checkpoint ./checkpoints/fsg_vit_imagenette_demo.pth
+```
+```bash
+# Train on MNIST
+python demo_training_mnist.py
+# Inference on MNIST
+python demo_inference_mnist.py --checkpoint ./checkpoints/fsg_vit_mnist_demo.pth
+```
+> ⚠️ These demos use reduced test sets and train for few iterations to make training quick. They're not meant for benchmarking, but rather for showcasing FSG integration.
+---
+## 🧠 Applicability Beyond Endoscopy
+Although designed for **polyp size estimation in colonoscopy**, FSG is a **general mechanism** for:
+- **Image classification**
+- **Medical image analysis**
+- **Multimodal fusion**
+- **NLP Transformers** (e.g., GPTs, BERT) — apply FSG over token embeddings
+We strongly encourage researchers to test FSG in **non-medical** domains.
+---
+## 📦 Files and Structure
+```
+.
+├── vit_with_fsg.py                  # ViT + FSG wrapper
+├── demo_training_mnist.py
+├── demo_inference_mnist.py
+├── demo_training_imnet.py
+├── demo_inference_imnet.py
+├── checkpoints/                    # Folder for .pth checkpoints
+```
+---
+## 📚 Citation
+Please cite our work if you use this repository:
+```bibtex
+@inproceedings{roffo2024FSG,
+   title={Feature Selection Gates with Gradient Routing for Endoscopic Image Computing},
+   author={Giorgio Roffo and Carlo Biffi and Pietro Salvagnini and Andrea Cherubini},
+   booktitle={MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, Marrakech, Morocco, October 2024.},
+   year={2024},
+   organization={Springer}
+}
+```
+---
+## 📬 Contact
+Lead Author: **Giorgio Roffo**
+📧 giorgio.roffo@gmail.com
+🏢 Cosmo Intelligent Medical Devices (IMD), Lainate, Italy
+For more: [github.com/cosmoimd/feature-selection-gates](https://github.com/cosmoimd/feature-selection-gates)

demo_inference_imnet.py ADDED Viewed

	@@ -0,0 +1,124 @@

+'''
+Demo script for applying Feature Selection Gates (FSG) to torchvision Vision Transformers
+and running inference on the ImageNet-mini (Imagenette) validation set.
+Each image is resized to 224x224 and has 3 RGB channels to be compatible with ViT.
+Usage:
+demo_inference_imnet.py --checkpoint ./checkpoints/fsg_vit_imagenette_demo.pth
+Paper:
+https://papers.miccai.org/miccai-2024/316-Paper0410.html
+Code:
+https://github.com/cosmoimd/feature-selection-gates
+Contact:
+giorgio.roffo@gmail.com
+'''
+import warnings
+warnings.filterwarnings("ignore")
+import os
+import sys
+import tarfile
+import urllib.request
+import torch
+import psutil
+from torchvision.models import vit_b_16, ViT_B_16_Weights
+from vit_with_fsg import vit_with_fsg
+from torchvision import transforms
+from torchvision.datasets import ImageFolder
+from torch.utils.data import DataLoader
+import torch.nn.functional as F
+from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
+from tqdm import tqdm
+import argparse
+parser = argparse.ArgumentParser(description="FSG-ViT inference on Imagenette")
+parser.add_argument("--checkpoint", type=str, default=None, help="Path to .pth file of trained FSG-ViT model")
+args = parser.parse_args()
+if __name__ == "__main__":
+    warnings.filterwarnings("ignore", message="Failed to load image Python extension*")
+    wrn = False
+    print(f"\n📌 To run this script:\n"
+          f"   ▶ Without checkpoint: python {os.path.basename(__file__)}\n"
+          f"   ▶ With checkpoint:    python {os.path.basename(__file__)} --checkpoint path/to/model.pth\n")
+    # Device and system info
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    print(f"\n🖥️  Using device: {device}")
+    if device.type == "cuda":
+        print(f"🚀 CUDA device: {torch.cuda.get_device_name(0)}")
+        print(f"💾 GPU memory total: {torch.cuda.get_device_properties(0).total_memory / (1024 ** 3):.2f} GB")
+    print(f"🧠 System RAM: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB")
+    print("\n📥 Loading pretrained ViT backbone from torchvision...")
+    backbone = vit_b_16(weights=ViT_B_16_Weights.DEFAULT)
+    print("🔧 Wrapping with Feature Selection Gates (FSG)...")
+    model = vit_with_fsg(backbone).to(device)
+    if args.checkpoint is not None:
+        print(f"📂 Loading model weights from: {args.checkpoint}")
+        model.load_state_dict(torch.load(args.checkpoint, map_location=device))
+    else:
+        wrn = True
+        print("\n⚠️  No checkpoint provided. Evaluating randomly initialized model! 🧪\n")
+        print("❗ Note: The model has not been trained. Results will reflect a randomly initialized backbone.")
+    model.eval()
+    print("📚 Loading Imagenette validation set (224x224 RGB)...")
+    imagenette_path = "./imagenette2-160/val"
+    if not os.path.exists(imagenette_path):
+        print("📦 Downloading Imagenette...")
+        url = "https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz"
+        tgz_path = "imagenette2-160.tgz"
+        urllib.request.urlretrieve(url, tgz_path)
+        print("📂 Extracting Imagenette dataset...")
+        with tarfile.open(tgz_path, "r:gz") as tar:
+            tar.extractall()
+        os.remove(tgz_path)
+        print("✅ Dataset ready.")
+    transform = transforms.Compose([
+        transforms.Resize((224, 224)),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=[0.5]*3, std=[0.5]*3)
+    ])
+    dataset = ImageFolder(root=imagenette_path, transform=transform)
+    dataloader = DataLoader(dataset, batch_size=32, shuffle=False)
+    y_true = []
+    y_pred = []
+    print("🧪 Running inference on Imagenette validation set using FSG-ViT-B-16 (code by G. Roffo)...\n\n")
+    with torch.no_grad():
+        for images, labels in tqdm(dataloader, desc="🔍 Inference progress", ncols=100):
+            images = images.to(device)
+            labels = labels.to(device)
+            outputs = model(images)
+            preds = torch.argmax(F.softmax(outputs, dim=1), dim=1)
+            y_true.extend(labels.cpu().tolist())
+            y_pred.extend(preds.cpu().tolist())
+    print("✅ Inference completed.")
+    acc = accuracy_score(y_true, y_pred)
+    prec = precision_score(y_true, y_pred, average='macro', zero_division=0)
+    rec = recall_score(y_true, y_pred, average='macro', zero_division=0)
+    f1 = f1_score(y_true, y_pred, average='macro', zero_division=0)
+    if wrn == True:
+        print("\n⚠️  No checkpoint provided. Evaluated randomly initialized model! 🧪\n")
+        print(f"\n📌 To run this script:\n"
+              f"   ▶ With checkpoint:    python {os.path.basename(__file__)} --checkpoint path/to/model.pth\n")
+    print(f"📊 Accuracy:  {acc * 100:.2f}%")
+    print(f"📊 Precision: {prec * 100:.2f}%")
+    print(f"📊 Recall:    {rec * 100:.2f}%")
+    print(f"📊 F1 Score:  {f1 * 100:.2f}%")

demo_inference_mnist.py ADDED Viewed

	@@ -0,0 +1,108 @@

+'''
+Demo script for applying Feature Selection Gates (FSG) to torchvision Vision Transformers
+and running inference on the MNIST test set.
+Each MNIST image is resized to 224x224 and converted to 3 channels to be compatible with ViT.
+Usage:
+demo_inference_mnist.py --checkpoint ./checkpoints/fsg_vit_mnist_demo.pth
+Paper:
+https://papers.miccai.org/miccai-2024/316-Paper0410.html
+Code:
+https://github.com/cosmoimd/feature-selection-gates
+Contact:
+giorgio.roffo@gmail.com
+'''
+import torch
+import psutil
+import argparse
+import warnings
+from torchvision.models import vit_b_16, ViT_B_16_Weights
+from vit_with_fsg import vit_with_fsg
+from torchvision.datasets import MNIST
+from torchvision import transforms
+from torch.utils.data import DataLoader
+import torch.nn.functional as F
+from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
+from tqdm import tqdm
+import os
+warnings.filterwarnings("ignore")
+parser = argparse.ArgumentParser(description="FSG-ViT inference on MNIST")
+parser.add_argument("--checkpoint", type=str, default=None, help="Path to .pth file of trained FSG-ViT model")
+args = parser.parse_args()
+if __name__ == "__main__":
+    warnings.filterwarnings("ignore", message="Failed to load image Python extension*")
+    wrn = False
+    print(f"\n📌 To run this script:\n"
+          f"   ▶ Without checkpoint: python {os.path.basename(__file__)}\n"
+          f"   ▶ With checkpoint:    python {os.path.basename(__file__)} --checkpoint path/to/model.pth\n")
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    print(f"\n🖥️  Using device: {device}")
+    if device.type == "cuda":
+        print(f"🚀 CUDA device: {torch.cuda.get_device_name(0)}")
+        print(f"💾 GPU memory total: {torch.cuda.get_device_properties(0).total_memory / (1024 ** 3):.2f} GB")
+    print(f"🧠 System RAM: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB")
+    print("\n📥 Loading pretrained ViT backbone from torchvision...")
+    backbone = vit_b_16(weights=ViT_B_16_Weights.DEFAULT)
+    print("🔧 Wrapping with Feature Selection Gates (FSG)...")
+    model = vit_with_fsg(backbone).to(device)
+    if args.checkpoint is not None:
+        print(f"📂 Loading model weights from: {args.checkpoint}")
+        model.load_state_dict(torch.load(args.checkpoint, map_location=device))
+    else:
+        wrn = True
+        print("\n⚠️  No checkpoint provided. Evaluating randomly initialized model! 🧪\n")
+        print("❗ Note: The model has not been trained. Results will reflect a randomly initialized backbone.")
+    model.eval()
+    print("📚 Loading MNIST test set (resized to 224x224, 3-channel)...")
+    transform = transforms.Compose([
+        transforms.Resize((224, 224)),
+        transforms.Grayscale(num_output_channels=3),
+        transforms.ToTensor(),
+        transforms.Normalize((0.5,), (0.5,))
+    ])
+    test_dataset = MNIST(root="./data", train=False, download=True, transform=transform)
+    test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
+    y_true = []
+    y_pred = []
+    print("🧪 Running inference on MNIST test set using FSG-ViT-B-16 (code by G. Roffo)...")
+    with torch.no_grad():
+        for images, labels in tqdm(test_loader, desc="🔍 Inference progress", ncols=100):
+            images = images.to(device)
+            labels = labels.to(device)
+            outputs = model(images)
+            preds = torch.argmax(F.softmax(outputs, dim=1), dim=1)
+            y_true.extend(labels.cpu().tolist())
+            y_pred.extend(preds.cpu().tolist())
+    print("✅ Inference completed.")
+    acc = accuracy_score(y_true, y_pred)
+    prec = precision_score(y_true, y_pred, average='macro', zero_division=0)
+    rec = recall_score(y_true, y_pred, average='macro', zero_division=0)
+    f1 = f1_score(y_true, y_pred, average='macro', zero_division=0)
+    if wrn == True:
+        print("\n⚠️  No checkpoint provided. Evaluated randomly initialized model! 🧪\n")
+        print(f"\n📌 To run this script:\n"
+              f"   ▶ With checkpoint:    python {os.path.basename(__file__)} --checkpoint path/to/model.pth\n")
+    print(f"📊 Accuracy:  {acc * 100:.2f}%")
+    print(f"📊 Precision: {prec * 100:.2f}%")
+    print(f"📊 Recall:    {rec * 100:.2f}%")
+    print(f"📊 F1 Score:  {f1 * 100:.2f}%")

demo_training_imnet.py ADDED Viewed

	@@ -0,0 +1,114 @@

+'''
+Demo training script for Feature Selection Gates (FSG) with ViT on Imagenette
+This script loads the Imagenette dataset (ImageNet-mini),
+trains a ViT model augmented with FSG, and saves the model checkpoint.
+Paper:
+https://papers.miccai.org/miccai-2024/316-Paper0410.html
+Code:
+https://github.com/cosmoimd/feature-selection-gates
+Contact:
+giorgio.roffo@gmail.com
+'''
+import os
+import tarfile
+import urllib.request
+import torch
+import torch.nn as nn
+import torch.optim as optim
+import psutil
+from tqdm import tqdm
+from torchvision import transforms
+from torchvision.models import vit_b_16, ViT_B_16_Weights
+from torchvision.datasets import ImageFolder
+from torch.utils.data import DataLoader
+from vit_with_fsg import vit_with_fsg
+# System info
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+print(f"\n🖥️  Using device: {device}")
+if device.type == "cuda":
+    print(f"🚀 CUDA device: {torch.cuda.get_device_name(0)}")
+    print(f"💾 GPU memory total: {torch.cuda.get_device_properties(0).total_memory / (1024 ** 3):.2f} GB")
+print(f"🧠 System RAM: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB")
+# Dataset path
+imagenette_path = "./imagenette2-160/val"
+if not os.path.exists(imagenette_path):
+    print("📦 Downloading Imagenette...")
+    url = "https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz"
+    tgz_path = "imagenette2-160.tgz"
+    urllib.request.urlretrieve(url, tgz_path)
+    print("📂 Extracting Imagenette dataset...")
+    with tarfile.open(tgz_path, "r:gz") as tar:
+        tar.extractall()
+    os.remove(tgz_path)
+    print("✅ Dataset ready.")
+# Transforms
+transform = transforms.Compose([
+    transforms.Resize((224, 224)),
+    transforms.ToTensor(),
+    transforms.Normalize(mean=[0.5]*3, std=[0.5]*3)
+])
+# Dataset and loader
+dataset = ImageFolder(root=imagenette_path, transform=transform)
+dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
+# Model setup
+print("\n📥 Loading pretrained ViT backbone from torchvision...")
+backbone = vit_b_16(weights=ViT_B_16_Weights.DEFAULT)
+model = vit_with_fsg(backbone).to(device)
+# Optimizer with separate LRs for FSG and base ViT
+fsg_params, base_params = [], []
+for name, param in model.named_parameters():
+    if 'fsag_rgb_ls' in name:
+        fsg_params.append(param)
+    else:
+        base_params.append(param)
+lr_base = 1e-4
+lr_fsg = 5e-4
+print(f"\n🔧 Optimizer setup:")
+print(f"   🔹 Base ViT parameters LR: {lr_base}")
+print(f"   🔸 FSG parameters LR: {lr_fsg}")
+optimizer = optim.AdamW([
+    {"params": base_params, "lr": lr_base},
+    {"params": fsg_params, "lr": lr_fsg}
+])
+criterion = nn.CrossEntropyLoss()
+# Training loop
+epochs = 3
+print(f"\n🚀 Starting demo training for {epochs} epochs...")
+model.train()
+for epoch in range(epochs):
+    steps_demo = 0  # to remove: for demo only
+    running_loss = 0.0
+    pbar = tqdm(dataloader, desc=f"Epoch {epoch+1}/{epochs}", ncols=100)
+    for inputs, targets in pbar:
+        if steps_demo > 25: # to remove: for demo only
+            break # to remove: for demo only
+        steps_demo += 1  # to remove: for demo only
+        inputs, targets = inputs.to(device), targets.to(device)
+        optimizer.zero_grad()
+        outputs = model(inputs)
+        loss = criterion(outputs, targets)
+        loss.backward()
+        optimizer.step()
+        running_loss += loss.item()
+        pbar.set_postfix({"loss": running_loss / (pbar.n + 1e-8)})
+print("\n✅ Training complete.")
+# Save checkpoint
+ckpt_dir = "./checkpoints"
+os.makedirs(ckpt_dir, exist_ok=True)
+ckpt_path = os.path.join(ckpt_dir, "fsg_vit_imagenette_demo.pth")
+torch.save(model.state_dict(), ckpt_path)
+print(f"💾 Checkpoint saved to: {ckpt_path}")

demo_training_mnist.py ADDED Viewed

	@@ -0,0 +1,106 @@

+'''
+Demo training script for Feature Selection Gates (FSG) with ViT on MNIST test set
+This is a minimal demo: we train only on the MNIST test set (resized and converted to 3-channel)
+for a few epochs to simulate training, save the checkpoint, and allow downstream inference.
+Paper:
+https://papers.miccai.org/miccai-2024/316-Paper0410.html
+Code:
+https://github.com/cosmoimd/feature-selection-gates
+Contact:
+giorgio.roffo@gmail.com
+'''
+import os
+import warnings
+import torch
+import torch.nn as nn
+import torch.optim as optim
+import psutil
+from tqdm import tqdm
+from torchvision import transforms
+from torchvision.datasets import MNIST
+from torchvision.models import vit_b_16, ViT_B_16_Weights
+from torch.utils.data import DataLoader
+from vit_with_fsg import vit_with_fsg
+warnings.filterwarnings("ignore")
+# Device info
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+print(f"\n🖥️  Using device: {device}")
+if device.type == "cuda":
+    print(f"🚀 CUDA device: {torch.cuda.get_device_name(0)}")
+    print(f"💾 GPU memory total: {torch.cuda.get_device_properties(0).total_memory / (1024 ** 3):.2f} GB")
+print(f"🧠 System RAM: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB")
+# Dataset loading
+print("\n📚 Loading MNIST demo set for demo training (resized to 224x224, 3-channel)...")
+transform = transforms.Compose([
+    transforms.Resize((224, 224)),
+    transforms.Grayscale(num_output_channels=3),
+    transforms.ToTensor(),
+    transforms.Normalize((0.5,), (0.5,))
+])
+dataset = MNIST(root="./data", train=False, download=True, transform=transform)
+dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
+# Load ViT backbone and wrap with FSG
+print("\n📥 Loading pretrained ViT backbone from torchvision...")
+backbone = vit_b_16(weights=ViT_B_16_Weights.DEFAULT)
+model = vit_with_fsg(backbone).to(device)
+# Prepare optimizer with different LRs for FSG parameters and base model
+fsg_params = []
+base_params = []
+for name, param in model.named_parameters():
+    if 'fsag_rgb_ls' in name:
+        fsg_params.append(param)
+    else:
+        base_params.append(param)
+# Assign a higher LR to FSG parameters, lower to base ViT params
+lr_base = 1e-4
+lr_fsg = 5e-4
+print(f"\n🔧 Optimizer setup:")
+print(f"   🔹 Base ViT parameters LR: {lr_base}")
+print(f"   🔸 FSG parameters LR: {lr_fsg}")
+optimizer = optim.AdamW([
+    {"params": base_params, "lr": lr_base},
+    {"params": fsg_params, "lr": lr_fsg}
+])
+criterion = nn.CrossEntropyLoss()
+epochs = 3
+print(f"\n🚀 Starting demo training for {epochs} epochs...")
+model.train()
+for epoch in range(epochs):
+    steps_demo = 0 # to remove: for demo only
+    running_loss = 0.0
+    pbar = tqdm(dataloader, desc=f"Epoch {epoch+1}/{epochs}", ncols=100)
+    for inputs, targets in pbar:
+        if steps_demo > 25: # to remove: for demo only
+            break # to remove: for demo only
+        steps_demo += 1  # to remove: for demo only
+        inputs, targets = inputs.to(device), targets.to(device)
+        optimizer.zero_grad()
+        outputs = model(inputs)
+        loss = criterion(outputs, targets)
+        loss.backward()
+        optimizer.step()
+        running_loss += loss.item()
+        pbar.set_postfix({"loss": running_loss / (pbar.n + 1e-8)})
+print("\n✅ Training complete.")
+# Save checkpoint
+ckpt_dir = "./checkpoints"
+os.makedirs(ckpt_dir, exist_ok=True)
+ckpt_path = os.path.join(ckpt_dir, "fsg_vit_mnist_demo.pth")
+torch.save(model.state_dict(), ckpt_path)
+print(f"💾 Checkpoint saved to: {ckpt_path}")

vit_with_fsg.py ADDED Viewed

	@@ -0,0 +1,109 @@

+'''
+ViTwithFSG: Vision Transformer wrapper with Feature Selection Gates (FSG)
+This script defines a wrapper class to apply Feature Selection Gates (FSG) to a Vision Transformer (ViT) model.
+FSG enhances model generalization by introducing sparse, learnable gates on the residual paths of attention and MLP blocks.
+It is a form of architectural regularization designed for vision tasks and applicable to NLP tasks.
+The method is introduced in:
+@inproceedings{roffo2024FSG,
+   title={Feature Selection Gates with Gradient Routing for Endoscopic Image Computing},
+   author={Giorgio Roffo and Carlo Biffi and Pietro Salvagnini and Andrea Cherubini},
+   booktitle={MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, Marrakech, Morocco, October 2024.},
+   year={2024},
+   organization={Springer}
+}
+- Publication: https://papers.miccai.org/miccai-2024/316-Paper0410.html
+- Code: https://github.com/cosmoimd/feature-selection-gates
+- Contact: giorgio.roffo@gmail.com
+- Affiliation: Cosmo Intelligent Medical Devices (IMD), Lainate, Italy
+'''
+# imports
+import warnings
+warnings.filterwarnings("ignore")
+import torch
+import torch.nn as nn
+from torchvision.models.vision_transformer import VisionTransformer
+class FSGBlock(nn.Module):
+    """
+    A Transformer encoder block augmented with Feature Selection Gates (FSG).
+    Each residual path (attention and MLP) is weighted element-wise by a learnable sigmoid gate.
+    This promotes sparse activation and serves as a regularization mechanism to avoid overfitting.
+    """
+    def __init__(self, original_block):
+        super().__init__()
+        self.self_attention = original_block.self_attention  # Multi-head self-attention module
+        self.mlp = original_block.mlp                        # Feedforward network (2-layer MLP)
+        self.ln_1 = original_block.ln_1                      # LayerNorm before attention
+        self.ln_2 = original_block.ln_2                      # LayerNorm before MLP
+        self.dropout = original_block.dropout                # Dropout after attention
+        dim = self.ln_1.normalized_shape[0]                  # Dimensionality of the model
+        # FSG: learnable gates (one per channel), initialized with Xavier normal
+        self.fsg_rectifier = nn.Sigmoid()
+        self.fsg_rgb_ls1 = nn.Parameter(torch.empty(dim))  # Gate for attention path
+        self.fsg_rgb_ls2 = nn.Parameter(torch.empty(dim))  # Gate for MLP path
+        nn.init.xavier_normal_(self.fsg_rgb_ls1.unsqueeze(0), gain=nn.init.calculate_gain('sigmoid'))
+        nn.init.xavier_normal_(self.fsg_rgb_ls2.unsqueeze(0), gain=nn.init.calculate_gain('sigmoid'))
+    def forward(self, x):
+        # Self-attention + gate
+        x_norm = self.ln_1(x)
+        attn_output, _ = self.self_attention(x_norm, x_norm, x_norm, need_weights=False)
+        attn_output = self.dropout(attn_output)
+        fsg_scores_1 = self.fsg_rectifier(self.fsg_rgb_ls1)
+        x = x + attn_output * fsg_scores_1  # Residual connection weighted by gate
+        # MLP + gate
+        x_norm = self.ln_2(x)
+        mlp_output = self.mlp(x_norm)
+        fsg_scores_2 = self.fsg_rectifier(self.fsg_rgb_ls2)
+        x = x + mlp_output * fsg_scores_2   # Residual connection weighted by gate
+        return x
+class ViTwithFSG(nn.Module):
+    """
+    Wrapper module that injects FSGBlocks into each Transformer encoder block of a given ViT model.
+    """
+    def __init__(self, vit_backbone: VisionTransformer):
+        super().__init__()
+        self.vit = vit_backbone
+        for i, blk in enumerate(self.vit.encoder.layers):
+            self.vit.encoder.layers[i] = FSGBlock(blk)  # Replace original block with FSGBlock
+    def forward(self, x):
+        return self.vit(x)
+def vit_with_fsg(vit_backbone: VisionTransformer):
+    """
+    Factory function that wraps a torchvision VisionTransformer with FSG-enhanced encoder blocks.
+    """
+    return ViTwithFSG(vit_backbone)
+# === Example Usage ===
+if __name__ == "__main__":
+    import warnings
+    warnings.filterwarnings("ignore", message="Failed to load image Python extension*")
+    from torchvision.models import vit_b_16, ViT_B_16_Weights
+    print("\n📥 Loading pretrained ViT_B_16 backbone from torchvision...")
+    backbone = vit_b_16(weights=ViT_B_16_Weights.DEFAULT)
+    print("🔧 Wrapping with Feature Selection Gates (FSG)...")
+    model = vit_with_fsg(vit_backbone=backbone)
+    print("🧪 Running dummy input through FSG-augmented ViT...")
+    dummy_input = torch.randn(1, 3, 224, 224)
+    output = model(dummy_input)
+    print("✅ Inference completed.")
+    print("📐 Output shape:", output.shape)