Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2307.06304

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 25
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 31
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 39

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Paper • 2307.06304 • Published Jul 12, 2023 • 31

Sora Reference Papers

A collection of all papers referenced in OpenAI's "Video generation models as world simulators" technical report • openai.com/sora

Unsupervised Learning of Video Representations using LSTMs

Paper • 1502.04681 • Published Feb 16, 2015 • 1
Recurrent Environment Simulators

Paper • 1704.02254 • Published Apr 7, 2017 • 2
World Models

Paper • 1803.10122 • Published Mar 27, 2018 • 3
Generating Videos with Scene Dynamics

Paper • 1609.02612 • Published Sep 8, 2016 • 1

Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Paper • 2307.06304 • Published Jul 12, 2023 • 31

Sora参考论文

OpenAI "Video generation models as world simulators"技术报告后面的参考论文，总共32篇。OpenAI的ImageGPT和Dalle3这两篇缺失，链接已补充到note中。

Unsupervised Learning of Video Representations using LSTMs

Paper • 1502.04681 • Published Feb 16, 2015 • 1
Recurrent Environment Simulators

Paper • 1704.02254 • Published Apr 7, 2017 • 2
World Models

Paper • 1803.10122 • Published Mar 27, 2018 • 3
Generating Videos with Scene Dynamics

Paper • 1609.02612 • Published Sep 8, 2016 • 1

DocGraphLM: Documental Graph Language Model for Information Extraction

Paper • 2401.02823 • Published Jan 5, 2024 • 37
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4, 2024 • 66
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 25
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 31
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 39

Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Paper • 2307.06304 • Published Jul 12, 2023 • 31

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Paper • 2307.06304 • Published Jul 12, 2023 • 31

Sora参考论文

OpenAI "Video generation models as world simulators"技术报告后面的参考论文，总共32篇。OpenAI的ImageGPT和Dalle3这两篇缺失，链接已补充到note中。

Unsupervised Learning of Video Representations using LSTMs

Paper • 1502.04681 • Published Feb 16, 2015 • 1
Recurrent Environment Simulators

Paper • 1704.02254 • Published Apr 7, 2017 • 2
World Models

Paper • 1803.10122 • Published Mar 27, 2018 • 3
Generating Videos with Scene Dynamics

Paper • 1609.02612 • Published Sep 8, 2016 • 1

Sora Reference Papers

A collection of all papers referenced in OpenAI's "Video generation models as world simulators" technical report • openai.com/sora

Unsupervised Learning of Video Representations using LSTMs

Paper • 1502.04681 • Published Feb 16, 2015 • 1
Recurrent Environment Simulators

Paper • 1704.02254 • Published Apr 7, 2017 • 2
World Models

Paper • 1803.10122 • Published Mar 27, 2018 • 3
Generating Videos with Scene Dynamics

Paper • 1609.02612 • Published Sep 8, 2016 • 1

DocGraphLM: Documental Graph Language Model for Information Extraction

Paper • 2401.02823 • Published Jan 5, 2024 • 37
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4, 2024 • 66
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs