ViStoryBench: Comprehensive Benchmark Suite for Story Visualization
Abstract
ViStoryBench is a comprehensive evaluation benchmark for story visualization frameworks, featuring diverse datasets and metrics to assess model performance across narrative and visual dimensions.
Story visualization, which aims to generate a sequence of visually coherent images aligning with a given narrative and reference images, has seen significant progress with recent advancements in generative models. To further enhance the performance of story visualization frameworks in real-world scenarios, we introduce a comprehensive evaluation benchmark, ViStoryBench. We collect a diverse dataset encompassing various story types and artistic styles, ensuring models are evaluated across multiple dimensions such as different plots (e.g., comedy, horror) and visual aesthetics (e.g., anime, 3D renderings). ViStoryBench is carefully curated to balance narrative structures and visual elements, featuring stories with single and multiple protagonists to test models' ability to maintain character consistency. Additionally, it includes complex plots and intricate world-building to challenge models in generating accurate visuals. To ensure comprehensive comparisons, our benchmark incorporates a wide range of evaluation metrics assessing critical aspects. This structured and multifaceted framework enables researchers to thoroughly identify both the strengths and weaknesses of different models, fostering targeted improvements.
Community
Project Page: https://vistorybench.github.io/
Dataset: https://huggingface.co/datasets/ViStoryBench/ViStoryBench
Code: https://github.com/vistorybench/vistorybench
Browse Stories: https://vistorybench.github.io/story_detail
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives (2025)
- CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition (2025)
- EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models (2025)
- VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? (2025)
- Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation (2025)
- DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? (2025)
- Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper