CLIP-RT Pretrained on OXE Data
This is the CLIP-RT model pretrained on Open X-Embodiment (OXE) data. We finetuned this model on downstream data, such as robot data in real-world or simulated environments. Please refer to the clip-rt github repository to see how to finetune this model.
Hyperparemeters
Category | Details |
---|---|
Train | 8 × H100 GPUs, each with 80GB VRAM |
Model size | 1B |
Loss | Binary Cross-Entropy |
Epochs | 20 |
Citation
@article{kang2024cliprt,
title={CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision},
author={Kang, Gi-Cheon and Kim, Junghyun and Shim, Kyuhwan and Lee, Jun Ki and Zhang, Byoung-Tak},
journal={arXiv preprint arXiv:2411.00508},
year = {2024}
}