-
YOLOE: Real-Time Seeing Anything
Paper • 2503.07465 • Published • 14 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 41 -
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Paper • 2410.12628 • Published • 40
Collections
Discover the best community collections!
Collections including paper arxiv:2401.17270
-
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 41 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 30
-
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 41 -
DETRs Beat YOLOs on Real-time Object Detection
Paper • 2304.08069 • Published • 13 -
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
Paper • 2407.17140 • Published • 2
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 33 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 31 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3
-
YOLOE: Real-Time Seeing Anything
Paper • 2503.07465 • Published • 14 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 41 -
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Paper • 2410.12628 • Published • 40
-
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 41 -
DETRs Beat YOLOs on Real-time Object Detection
Paper • 2304.08069 • Published • 13 -
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
Paper • 2407.17140 • Published • 2
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 33 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 31 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3
-
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 41 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 30