Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
markredito 's Collections
Image Generation
LLMs
Audio
Interpretability
Multimodal
Music Generation
experiments
robotics
3D

Multimodal

updated Sep 7, 2024
Upvote
-

  • Compositional Foundation Models for Hierarchical Planning

    Paper • 2309.08587 • Published Sep 15, 2023 • 11

  • DreamLLM: Synergistic Multimodal Comprehension and Creation

    Paper • 2309.11499 • Published Sep 20, 2023 • 58

  • VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

    Paper • 2309.15091 • Published Sep 26, 2023 • 33

  • Context-Aware Meta-Learning

    Paper • 2310.10971 • Published Oct 17, 2023 • 17

  • Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

    Paper • 2310.11441 • Published Oct 17, 2023 • 28

  • MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

    Paper • 2310.09478 • Published Oct 14, 2023 • 21

  • VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

    Paper • 2403.00522 • Published Mar 1, 2024 • 47

  • Building and better understanding vision-language models: insights and future directions

    Paper • 2408.12637 • Published Aug 22, 2024 • 131
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs