GridNet-HD Baseline: Late Fusion MLP on Dual Softmax Outputs

Overview

This repository provides an implementation of a simple Multi-Layer Perceptron (MLP) baseline on the task of late fusing two LiDAR softmax outputs. Before using this baseline, results from the 2 other baselines are required. This repository includes:

Per-zone preprocessing of two LiDAR softmax files (image_vote and spt) into combined feature tensors
A lightweight SimpleMLP model that concatenates the two softmax vectors per point
Training, validation and inference loops
Weights & Biases integration for real-time experiment tracking

This implementation serves as one of the official baselines for GridNet-HD.

Project Structure
Configuration
Environment
Dataset Structure
Installation
Supported Modes
Results
Pretrained Weights
Usage Examples
Weights & Biases Integration
License
Contact
Citation

Project Structure

project_root/
├── main.py # Entry point (all modes)
├── config.yaml # Configuration parameters
├── dataset/
│ ├── lidar_dataset.py # dataset for training/validation/test
│ └── preprocess_multi_processing.py # prepare files for training
├── las_utils/
│ ├── io.py # .las reading
│ └── matching.py # Nearest-neighbor matching
├── model/
│ └── model.py # SimpleMLP definition
├── train/
│ ├── train.py # eval and train loop
│ └── test.py # inference function for test split
├── utils/
│ ├── logging_utils.py
│ └── metrics.py # compute all metrics
├── requirements.txt # Python dependencies
└── README.md # This file

Configuration

All training and evaluation settings are stored in a single file: config.yaml.

Key sections

`dataset`

Controls data loading, preprocessing, and class remapping.

root: path to the folder containing all raw zones
split_file: path to a JSON file defining train/val/test splits
n_classes: number of target classes
voxel_size: downsampling voxel size (used for KDTree)
pre-processing_num_workers: parallelism for data preprocessing
max_point_per_class: maximum number of points sampled per class during training
class_map: label remapping rules (original → new class)

`training`

Hyperparameters and runtime configuration.

batch_size, epochs, learning_rate, weight_decay
lr_step_size, lr_gamma: learning rate scheduler (StepLR)
device: "cuda" or "cpu"

`model`

Defines the MLP structure

hidden_dims: list of layer widths
ignore_index: label to ignore during loss computation

`logging`

Output and checkpoint configuration.

save_dir: where to store logs and model weights
save_freq: save checkpoint every N epochs

`wandb`

Weights & Biases experiment tracking.

project: W&B project name
entity: your W&B team or username

Environment

Component	Details
GPU	NVIDIA A40 (48 GB VRAM)
CUDA Version	12.x
OS	Ubuntu 22.04 LTS
RAM	256 GB

Dataset structure

The structure of the GridNet-HD dataset remains the same (see GridNet-HD dataset for more information) Raw zones (36 folders) are completed with the results from the 2 other baselines (soft-log LiDAR from ImageVote and SPT):

/path/to/data/
├── t1z5b/
│   ├── lidar_softmax_image_vote/t1z4_with_softmax.las   # LiDAR with soft-log from ImageVote baseline
│   ├── lidar_softmax_spt/t1z4_with_softmax.las          # LiDAR with soft-log from SPT baseline
│   └── lidar/t1z4.las                                   # ground-truth
├── …
└── split.json                      # maps zones → train/val/test

After preprocessing:

/path/to/data/preprocessed/
├── t1z4.pt        # contains {features and labels}
├── t1z5a.pt
└── …

Installation

Clone the repository:

git clone https://github.com/your-org/baseline_fusion_mlp.git
cd baseline_fusion_mlp

Create a conda virtual environment:

conda create -n gridnet_hd_mlp python=3.12
conda activate gridnet_hd_mlp

Install dependencies:

pip install --upgrade pip
pip install -r requirements.txt

Supported modes

Use --mode in main.py:

Mode	Description
`preprocess`	Convert all zones `.las` → `.pt` with remapping
`train`	Train SimpleMLP on train split
`val`	Validate model on val split
`test`	Evaluate on test split

Results

The following table summarizes the per-class Intersection over Union (IoU) scores on the test set at 3D level for the best model.

Class	IoU (Test set) (%)
Pylon	94.82
Conductor cable	94.40
Structural cable	82.52
Insulator	86.98
High vegetation	83.08
Low vegetation	47.64
Herbaceous vegetation	80.75
Rock, gravel, soil	42.89
Impervious soil (Road)	80.26
Water	61.69
Building	61.40
Mean IoU (mIoU)	74.22

Pretrained Weights

Checkpoints for the best-performing model (mIoU = 74.22%) are available directly in the repository.

Usage examples

Before training the model, use the preprocess mode and configure the config.yaml file accordingly.

Preprocessing

python main.py --mode preprocess --config config.yaml

This will concatenate features from SPT soft-log and ImageVote soft-log, apply remapping, and prepare files for training.

Training

python main.py --mode train --config config.yaml

Trains the MLP late fusion model using the dataset and settings defined in config.yaml. Checkpoints and logs are saved under logging.save_dir.

Validation

python main.py --mode val --config config.yaml --weights best_model.pt

Evaluates the model on the validation set and prints out per-class IoUs and mIoU.

Test (Las export)

python main.py --mode test --config config.yaml --weights best_model.pt

Runs inference on the test set and exports the original .las files with the field classification, which contains the predicted class label for each point.

Weights & Biases Integration

wandb login

Set logging.wandb.project & .entity in config.yaml.

All training and validation metrics will be tracked live.

License

This project is released under the MIT License.

Contact

For questions, issues, or contributions, please open an issue on the repository.

Citation

If you use this repo in research, please cite:

GridNet-HD: A High-Resolution Multi-Modal Dataset for LiDAR-Image Fusion on Power Line Infrastructure
Masked Authors
Submitted to NeurIPS 2025.

heig-vd-geo
/

LateFusionMLP_GridNet-HD_baseline