groffo
commited on
Commit
Β·
070887a
1
Parent(s):
d7087a8
Improve Hugging Face model card formatting and content
Browse files
README.md
CHANGED
@@ -1,127 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# π¬ Feature Selection Gates (FSG) for Vision Transformers (ViT)
|
2 |
|
3 |
-
This repository
|
4 |
|
5 |
> **Feature Selection Gates with Gradient Routing for Endoscopic Image Computing**
|
6 |
> Giorgio Roffo, Carlo Biffi, Pietro Salvagnini, Andrea Cherubini
|
7 |
-
>
|
8 |
-
> π [Paper](https://papers.miccai.org/miccai-2024/316-Paper0410.html) | π§ [arXiv](https://arxiv.org/abs/2407.04400) | π» [Code](https://github.com/cosmoimd/feature-selection-gates)
|
9 |
|
10 |
---
|
11 |
|
12 |
-
##
|
13 |
|
14 |
-
**FSG** introduces
|
|
|
|
|
|
|
15 |
|
16 |
-
**Gradient Routing (GR)**
|
17 |
-
- One
|
18 |
-
- A
|
19 |
-
This separation allows **task-specific tuning** and ensures stable learning.
|
20 |
|
21 |
---
|
22 |
|
23 |
-
## π‘
|
24 |
|
25 |
-
β
**
|
26 |
-
β
|
27 |
-
β
|
28 |
-
β
|
29 |
-
β
Compatible with **multi-stream CNNs** and hybrid models
|
30 |
|
31 |
-
|
|
|
|
|
|
|
32 |
|
33 |
---
|
34 |
|
35 |
-
## π§ͺ
|
36 |
-
|
37 |
-
Use the `vit_with_fsg.py` script to augment a pretrained ViT from `torchvision`.
|
38 |
|
39 |
```python
|
40 |
from torchvision.models import vit_b_16, ViT_B_16_Weights
|
41 |
from vit_with_fsg import vit_with_fsg
|
42 |
import torch
|
43 |
|
44 |
-
print("π₯ Loading pretrained
|
45 |
backbone = vit_b_16(weights=ViT_B_16_Weights.DEFAULT)
|
46 |
|
47 |
-
print("π§
|
48 |
model = vit_with_fsg(vit_backbone=backbone)
|
49 |
|
50 |
-
print("π§ͺ Running dummy input...")
|
51 |
dummy_input = torch.randn(1, 3, 224, 224)
|
52 |
output = model(dummy_input)
|
53 |
-
|
54 |
-
print("β
Done. Output shape:", output.shape)
|
55 |
```
|
56 |
|
57 |
---
|
58 |
|
59 |
-
##
|
60 |
-
|
61 |
-
We provide full working training and inference examples:
|
62 |
-
|
63 |
-
| Dataset | Training Script | Inference Script | Checkpoint Path |
|
64 |
-
|-------------|-----------------------------|------------------------------|----------------------------------------------|
|
65 |
-
| MNIST | `demo_training_mnist.py` | `demo_inference_mnist.py` | `./checkpoints/fsg_vit_mnist_demo.pth` |
|
66 |
-
| Imagenette | `demo_training_imnet.py` | `demo_inference_imnet.py` | `./checkpoints/fsg_vit_imagenette_demo.pth` |
|
67 |
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
- Saves checkpoints for reproducible inference.
|
73 |
|
74 |
-
|
75 |
-
|
76 |
-
```bash
|
77 |
-
# Train on Imagenette
|
78 |
-
python demo_training_imnet.py
|
79 |
-
|
80 |
-
# Inference on Imagenette
|
81 |
-
python demo_inference_imnet.py --checkpoint ./checkpoints/fsg_vit_imagenette_demo.pth
|
82 |
-
```
|
83 |
-
|
84 |
-
```bash
|
85 |
-
# Train on MNIST
|
86 |
-
python demo_training_mnist.py
|
87 |
-
|
88 |
-
# Inference on MNIST
|
89 |
-
python demo_inference_mnist.py --checkpoint ./checkpoints/fsg_vit_mnist_demo.pth
|
90 |
-
```
|
91 |
-
|
92 |
-
> β οΈ These demos use reduced test sets and train for few iterations to make training quick. They're not meant for benchmarking, but rather for showcasing FSG integration.
|
93 |
-
|
94 |
-
---
|
95 |
-
|
96 |
-
## π§ Applicability Beyond Endoscopy
|
97 |
-
|
98 |
-
Although designed for **polyp size estimation in colonoscopy**, FSG is a **general mechanism** for:
|
99 |
-
- **Image classification**
|
100 |
-
- **Medical image analysis**
|
101 |
-
- **Multimodal fusion**
|
102 |
-
- **NLP Transformers** (e.g., GPTs, BERT) β apply FSG over token embeddings
|
103 |
-
|
104 |
-
We strongly encourage researchers to test FSG in **non-medical** domains.
|
105 |
|
106 |
---
|
107 |
|
108 |
-
## π¦
|
109 |
|
110 |
```
|
111 |
.
|
112 |
-
βββ vit_with_fsg.py # ViT
|
113 |
βββ demo_training_mnist.py
|
114 |
βββ demo_inference_mnist.py
|
115 |
βββ demo_training_imnet.py
|
116 |
βββ demo_inference_imnet.py
|
117 |
-
βββ checkpoints/ #
|
|
|
118 |
```
|
119 |
|
120 |
---
|
121 |
|
122 |
## π Citation
|
123 |
|
124 |
-
|
125 |
|
126 |
```bibtex
|
127 |
@inproceedings{roffo2024FSG,
|
@@ -137,20 +111,7 @@ Please cite our work if you use this repository:
|
|
137 |
|
138 |
## π¬ Contact
|
139 |
|
140 |
-
|
141 |
π§ giorgio.roffo@gmail.com
|
142 |
-
π’ Cosmo Intelligent Medical Devices (IMD), Lainate, Italy
|
143 |
-
|
144 |
-
For more: [github.com/cosmoimd/feature-selection-gates](https://github.com/cosmoimd/feature-selection-gates)
|
145 |
-
|
146 |
-
---
|
147 |
-
tags:
|
148 |
-
- vision
|
149 |
-
- transformers
|
150 |
-
- vit
|
151 |
-
- feature-selection
|
152 |
-
- miccai2024
|
153 |
-
license: mit
|
154 |
-
library_name: PyTorch
|
155 |
-
inference: false
|
156 |
-
---
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- vision
|
4 |
+
- transformers
|
5 |
+
- vit
|
6 |
+
- feature-selection
|
7 |
+
- miccai2024
|
8 |
+
license: mit
|
9 |
+
library_name: PyTorch
|
10 |
+
inference: false
|
11 |
+
---
|
12 |
+
|
13 |
# π¬ Feature Selection Gates (FSG) for Vision Transformers (ViT)
|
14 |
|
15 |
+
This repository implements **Feature Selection Gates (FSG)** and **Gradient Routing (GR)** as a modular extension to Vision Transformers. It is based on our paper presented at **MICCAI 2024**:
|
16 |
|
17 |
> **Feature Selection Gates with Gradient Routing for Endoscopic Image Computing**
|
18 |
> Giorgio Roffo, Carlo Biffi, Pietro Salvagnini, Andrea Cherubini
|
19 |
+
> [MICCAI 2024](https://papers.miccai.org/miccai-2024/316-Paper0410.html), [arXiv](https://arxiv.org/abs/2407.04400), [GitHub](https://github.com/cosmoimd/feature-selection-gates)
|
|
|
20 |
|
21 |
---
|
22 |
|
23 |
+
## π§ What Is FSG?
|
24 |
|
25 |
+
**FSG** introduces learnable gates on residual branches within Transformer layers. These gates:
|
26 |
+
- Dynamically select relevant features
|
27 |
+
- Promote **sparse connectivity** during training
|
28 |
+
- Serve as a form of **architectural regularization**
|
29 |
|
30 |
+
To stabilize learning, **Gradient Routing (GR)** performs a **dual-pass** strategy:
|
31 |
+
- One forward pass to compute gradients for the base model
|
32 |
+
- A separate route to update FSG parameters independently
|
|
|
33 |
|
34 |
---
|
35 |
|
36 |
+
## π‘ Key Features
|
37 |
|
38 |
+
- β
**Drop-in**: Easily wraps any `torchvision` ViT model (e.g. `vit_b_16`, `vit_l_16`)
|
39 |
+
- β
**General-purpose**: Use on **natural images**, **medical data**, and even **token sequences in NLP**
|
40 |
+
- β
**Regularizes ViTs** for low-data regimes (tested on CIFAR-100, endoscopic videos, etc.)
|
41 |
+
- β
No ViT surgery: FSG wraps Transformer layers directly
|
|
|
42 |
|
43 |
+
While this method was originally proposed for **polyp size estimation in colonoscopy**, it is designed to generalize across:
|
44 |
+
- 𧬠Medical image analysis
|
45 |
+
- πΌοΈ General image classification
|
46 |
+
- π NLP Transformers (e.g. GPT, BERT)
|
47 |
|
48 |
---
|
49 |
|
50 |
+
## π§ͺ Minimal Example
|
|
|
|
|
51 |
|
52 |
```python
|
53 |
from torchvision.models import vit_b_16, ViT_B_16_Weights
|
54 |
from vit_with_fsg import vit_with_fsg
|
55 |
import torch
|
56 |
|
57 |
+
print("π₯ Loading pretrained ViT...")
|
58 |
backbone = vit_b_16(weights=ViT_B_16_Weights.DEFAULT)
|
59 |
|
60 |
+
print("π§ Injecting FSG into backbone...")
|
61 |
model = vit_with_fsg(vit_backbone=backbone)
|
62 |
|
|
|
63 |
dummy_input = torch.randn(1, 3, 224, 224)
|
64 |
output = model(dummy_input)
|
65 |
+
print("β
Output shape:", output.shape)
|
|
|
66 |
```
|
67 |
|
68 |
---
|
69 |
|
70 |
+
## π§ͺ Demos (Quick Training + Inference)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
|
72 |
+
| Dataset | Training Script | Inference Script | Checkpoint Path |
|
73 |
+
|-------------|-----------------------------|------------------------------|-----------------------------------------------|
|
74 |
+
| MNIST | `demo_training_mnist.py` | `demo_inference_mnist.py` | `./checkpoints/fsg_vit_mnist_demo.pth` |
|
75 |
+
| Imagenette | `demo_training_imnet.py` | `demo_inference_imnet.py` | `./checkpoints/fsg_vit_imagenette_demo.pth` |
|
|
|
76 |
|
77 |
+
> β οΈ These demos use reduced datasets and epochs to run quickly and demonstrate the API.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
---
|
80 |
|
81 |
+
## π¦ Project Structure
|
82 |
|
83 |
```
|
84 |
.
|
85 |
+
βββ vit_with_fsg.py # FSG-ViT integration
|
86 |
βββ demo_training_mnist.py
|
87 |
βββ demo_inference_mnist.py
|
88 |
βββ demo_training_imnet.py
|
89 |
βββ demo_inference_imnet.py
|
90 |
+
βββ checkpoints/ # Model weights (optional)
|
91 |
+
βββ README.md # This model card
|
92 |
```
|
93 |
|
94 |
---
|
95 |
|
96 |
## π Citation
|
97 |
|
98 |
+
If you use this project, please cite our work:
|
99 |
|
100 |
```bibtex
|
101 |
@inproceedings{roffo2024FSG,
|
|
|
111 |
|
112 |
## π¬ Contact
|
113 |
|
114 |
+
**Giorgio Roffo**
|
115 |
π§ giorgio.roffo@gmail.com
|
116 |
+
π’ Cosmo Intelligent Medical Devices (IMD), Lainate, Italy
|
117 |
+
π [github.com/cosmoimd/feature-selection-gates](https://github.com/cosmoimd/feature-selection-gates)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|