SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery
Authors:
Jian Song1,2, Hongruixuan Chen1, Weihao Xuan1,2, Junshi Xia2, Naoto Yokoya1,2
1 The University of Tokyo
2 RIKEN AIP
Conference: Neural Information Processing Systems (Spotlight), 2024
For more details, please refer to our paper and visit our GitHub repository.
Overview
TL;DR:
We are excited to release two high-performing models for height estimation and land cover mapping. These models were trained on the SynRS3D dataset using our novel domain adaptation method, RS3DAda.
- Encoder: Vision Transformer (ViT-L), pretrained with DINOv2
- Decoder: DPT, trained from scratch
These models excel in tasks involving large-scale global 3D semantic understanding from high-resolution remote sensing imagery. Feel free to integrate them into your projects for enhanced performance in related applications.
How to Cite
If you find the RS3DAda model useful in your research, please consider citing:
@article{song2024synrs3d,
title={SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery},
author={Song, Jian and Chen, Hongruixuan and Xuan, Weihao and Xia, Junshi and Yokoya, Naoto},
journal={arXiv preprint arXiv:2406.18151},
year={2024}
}
Contact
For any questions or feedback, please reach out via email at song@ms.k.u-tokyo.ac.jp.
We hope you enjoy using the pretrained RS3DAda models!
Model tree for JTRNEO/RS3DAda
Base model
facebook/dinov2-large