π VideoMAE-2 for Dashcam Collision Prediction
This repository contains model for predicting vehicle collisions using dashcam footage, developed for the Nexar Dashcam Collision Prediction Challenge. The model achieved a 29th place finish on the public leaderboard with a score of 0.76 mean Average Precision (mAP)
For Training code - GitHub
π§ Model Overview
Architecture: VideoMAE-2 Large fine-tuned for binary classification (collision/near-miss vs. normal driving).
Feature Extraction: Utilized TimeSformer for preprocessing input frames.
Input: 16 frames per video, each resized to 224x224 pixels.
Output: Probability score indicating the likelihood of a collision or near-miss event.
π Dataset
The model was trained on the Nexar Collision Prediction Dataset.
750 non-collision videos
400 collision videos
350 near-miss videos arXiv
Each video is annotated with:
Event Type: Collision, near-miss, or normal driving
Event Time: Timestamp of the (near-)collision
Alert Time: Earliest time the event could be predicted.
For more details, refer to the dataset paper.
π οΈ Preprocessing Pipeline
Frame Extraction: Sampled 16 frames per video, focusing on the interval around the alert time.
Feature Extraction: Applied TimeSformer feature extractor to obtain pixel values.
Data Augmentation: Implemented transformations such as horizontal flip, rotation, color jitter, and resized cropping.
Normalization: Used ImageNet mean and standard deviation for normalization.
ποΈ Training Details
Framework: PyTorch with Hugging Face Transformers and Trainer API.
Training Configuration:
Batch Size: 4
Epochs: 15
Learning Rate: 3e-5
Weight Decay: 0.01
Evaluation Strategy: Per epoch
Metric for Best Model: Average Precision
Hardware: Trained on 2x NVIDIA T4 GPUs (~4.5 hours)
π Evaluation Metrics
The model's performance was evaluated using Mean Average Precision (mAP) across different time-to-accident intervals:
500ms
1000ms
1500ms
The final score is the mean of the Average Precision (AP) values at these intervals, emphasizing early and accurate collision predictions
π Citation
If you use this model or dataset, please cite:
@misc{nexar2025dashcamcollisionprediction,
title={Nexar Dashcam Collision Prediction Dataset and Challenge},
author={Daniel C. Moura and Shizhan Zhu and Orly Zvitia},
year={2025},
eprint={2503.03848},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.03848}
}
- Downloads last month
- 40
Model tree for jatinmehra/Accident-Detection-using-Dashcam
Base model
MCG-NJU/videomae-large-finetuned-kinetics