CLIP-RT Pretrained on OXE Data

This is the CLIP-RT model pretrained on Open X-Embodiment (OXE) data. We finetuned this model on downstream data, such as robot data in real-world or simulated environments. Please refer to the clip-rt github repository to see how to finetune this model.

Hyperparemeters

Category Details
Train 8 × H100 GPUs, each with 80GB VRAM
Model size 1B
Loss Binary Cross-Entropy
Epochs 20

Citation

@article{kang2024cliprt,
  title={CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision},
  author={Kang, Gi-Cheon and Kim, Junghyun and Shim, Kyuhwan and Lee, Jun Ki and Zhang, Byoung-Tak},
  journal={arXiv preprint arXiv:2411.00508},
  year = {2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading