Jiaqi-hkust commited on
Commit
6efb3d2
·
verified ·
1 Parent(s): ef0f225

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -199
README.md CHANGED
@@ -1,199 +1,14 @@
1
- <div align="center">
2
-
3
- # Hawk: Learning to Understand Open-World Video Anomalies
4
-
5
- <div align="center">
6
-
7
- ### This is the official repository for [Hawk](https://arxiv.org/pdf/2405.16886).
8
-
9
- [Jiaqi Tang^](https://jqt.me/), [Hao Lu^](https://scholar.google.com/citations?user=OOagpAcAAAAJ&hl=en), [Ruizheng Wu](https://scholar.google.com/citations?user=OOagpAcAAAAJ&hl=en), [Xiaogang Xu](https://xuxiaogang.com/), [Ke Ma](https://scholar.google.com.hk/citations?user=yXGNGS8AAAAJ&hl=en), [Cheng Fang](),
10
- \
11
- [Bin Guo](http://www.guob.org/), [Jiangbo Lu](https://sites.google.com/site/jiangbolu), [Qifeng Chen](https://cqf.io/) and [Ying-Cong Chen*](https://www.yingcong.me/)
12
-
13
- ^: Equal contribution.
14
- *: Corresponding Author.
15
-
16
- [![made-for-VSCode](https://img.shields.io/badge/Made%20for-VSCode-1f425f.svg)](https://code.visualstudio.com/) [![Visits Badge](https://badges.strrl.dev/visits/jqtangust/hawk)](https://badges.strrl.dev)
17
-
18
-
19
-
20
- <img src="figs/icon.png" alt="Have eyes like a HAWK!" width="80">
21
- </div>
22
- </div>
23
-
24
- ## 🔍 **Motivation** - Have eyes like a Hawk!
25
- - 🚩 Current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction.
26
- - 🚩 Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios.
27
-
28
- <div align="center">
29
- <img src="figs/motivation1.png" alt="Hawk">
30
- </div>
31
-
32
-
33
- ## 📢 **Updates**
34
-
35
- - ✅ Feb 24, 2025 - We release the **training and demo code** of **Hawk**.
36
- - ✅ Feb 24, 2025 - We release the **dataset (video + annotation)** of **Hawk**. Check this Huggingface link for [DOWNLOAD](https://huggingface.co/datasets/Jiaqi-hkust/hawk).
37
- - ✅ Step 26, 2024 - **Hawk** is accepted by NeurIPS 2024.
38
- - ✅ June 29, 2024 - We release the **dataset (annotation)** of Hawk. Check this Google Cloud link for [DOWNLOAD](https://drive.google.com/file/d/1WCnizldWZvtS4Yg5SX7ay5C3kUQfz-Eg/view?usp=sharing).
39
-
40
-
41
- ## ▶️ **Getting Started**
42
-
43
- ### 🪒 *Installation*
44
- - Create environment by following steps:
45
- ```
46
- apt install ffmpeg
47
- conda env create -f environment.yml
48
- conda activate hawk
49
- ```
50
-
51
- ### 🏰 *Pretrained and Fine-tuned Model*
52
-
53
-
54
- - The following checkpoints are utilized to run Hawk:
55
-
56
- | Checkpoint | Link | Note |
57
- |:------------------|-------------|-------------|
58
- | Video-LLaMA-2-7B-Finetuned | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-2-7B-Finetuned/tree/main) | Used as initial weights for training.|
59
- | **Hawk_Pretrained** | [link](https://huggingface.co/Jiaqi-hkust/hawk) | Pretrained on the [WebViD](https://github.com/m-bain/webvid)|
60
- | **Hawk_Finetuned** | [link](https://huggingface.co/Jiaqi-hkust/hawk) | Fine-tuned on [Hawk dataset](https://huggingface.co/datasets/Jiaqi-hkust/hawk)|
61
-
62
- - If you want to use the pretrained model, please use the **Hawk_Pretrained** checkpoint.
63
- - If you wish to leverage the model for our anomaly understanding, please opt for the **Hawk_Finetuned** checkpoint.
64
-
65
-
66
- ## ⏳ **Domo**
67
-
68
- - The configuration files for [`demo`](/configs/eval_configs/eval.yaml).
69
-
70
- - Replace the following part as your own path:
71
- ```
72
- # Use LLaMA-2-chat as base modal
73
-
74
- # Some ckpts could be download from Video_LLaMA-2-7B-Finetuned
75
- # https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-2-7B-Finetuned
76
- llama_model: ".../Video-LLaMA-2-7B-Finetuned/llama-2-7b-chat-hf"
77
-
78
- # Hawk Weight (Pretrained or Finetuned)
79
- ckpt: '.../checkpoint.pth'
80
- ```
81
-
82
- - Then, run the script:
83
- ```
84
- python app.py \
85
- --cfg-path configs/eval_configs/eval.yaml \
86
- --model_type llama_v2 \
87
- --gpu-id 0
88
- ```
89
-
90
- - GUI
91
- <div align="center">
92
- <img src="figs/demo.png" alt="Hawk">
93
- </div>
94
-
95
- ## 🖥️ **Training**
96
-
97
- ### 💾 *Dataset Preparation*
98
-
99
- - **For your convenience, we now provide the video and annotations for the Hawk dataset. You can download them using the Hugglingface: [DOWNLOAD](https://huggingface.co/datasets/Jiaqi-hkust/hawk).**
100
-
101
- - Traditional Data Acquisition Method:
102
-
103
- - DOWNLOAD all video datasets for their original dources.
104
- 1. [CUHK_Avenue](https://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html)
105
- 2. [DoTA](https://github.com/MoonBlvd/Detection-of-Traffic-Anomaly)
106
- 3. [Ped1](http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm)
107
- 4. [Ped2](http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm)
108
- 5. [ShanghaiTech](https://svip-lab.github.io/dataset/campus_dataset.html)
109
- 6. [UBNormal](https://github.com/lilygeorgescu/UBnormal/)
110
- 7. [UCF_Crime](https://www.crcv.ucf.edu/projects/real-world/)
111
-
112
- - Google Drive Link to [DOWNLOAD](https://drive.google.com/file/d/1WCnizldWZvtS4Yg5SX7ay5C3kUQfz-Eg/view?usp=sharing) our annotations.
113
-
114
- - Data Structure: each forder contains one annotation file (e.g. CUHK Avenue, DoTA, etc.). The `All_Mix` directory contains all of datasets in training and testing.
115
-
116
- - The dataset is organized as follows:
117
-
118
- ```
119
- (Hawk_data)
120
-
121
- Annotation
122
- ├── All_Mix
123
- │ ├── all_videos_all.json
124
- │ ├── all_videos_test.json
125
- │ └── all_videos_train.json
126
-
127
- ├── CUHK_Avenue
128
- │ └── Avenue.json
129
- ├── DoTA
130
- │ └── DoTA.json
131
- ├── Ped1
132
- │ ├── ...
133
- ├── ...
134
- └── UCF_Crime
135
- │ └── ...
136
-
137
- Videos
138
- ├── CUHK_Avenue
139
- │ └── Avenue.json
140
- ├── DoTA
141
- │ └── DoTA.json
142
- ├── Ped1
143
- │ ├── ...
144
- ├── ...
145
-
146
- readme
147
-
148
- ```
149
- Note:the data path should be redefined.
150
-
151
-
152
- ### 🔨 *Configuration*
153
-
154
- - The configuration files for [`training`](/configs/train_configs) including two stages.
155
-
156
- - Replace the following part as your own path:
157
-
158
- ```
159
- llama_model: ".../Video-LLaMA-2-7B-Finetuned/llama-2-7b-chat-hf"
160
-
161
- # The ckpt of vision branch after stage1 pretrained, (only for stage 2)
162
- ckpt: ".../checkpoint.pth"
163
- ```
164
-
165
- ### 🖥️ *To Train*
166
-
167
- - Then, run the script:
168
- ```
169
- # for pretraining
170
- NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port='10000' train.py --cfg-path ./configs/train_configs/stage1_pretrain.yaml
171
-
172
- # for fine-tuning
173
- NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port='12001' train.py --cfg-path ./configs/train_configs/stage2_finetune.yaml
174
- ```
175
-
176
- *Resource Usage: Training (stage 1 and stage 2): 4 * RTX A6000 48G*
177
-
178
- ## 🌐 **Citations**
179
-
180
- **The following is a BibTeX reference:**
181
-
182
- ``` latex
183
- @inproceedings{atang2024hawk,
184
- title = {Hawk: Learning to Understand Open-World Video Anomalies},
185
- author = {Tang, Jiaqi and Lu, Hao and Wu, Ruizheng and Xu, Xiaogang and Ma, Ke and Fang, Cheng and Guo, Bin and Lu, Jiangbo and Chen, Qifeng and Chen, Ying-Cong},
186
- year = {2024},
187
- booktitle = {Neural Information Processing Systems (NeurIPS)}
188
- }
189
- ```
190
-
191
- ## 📧 **Connecting with Us?**
192
-
193
- If you have any questions, please feel free to send email to `jtang092@connect.hkust-gz.edu.cn`.
194
-
195
-
196
- ## 📜 **Acknowledgment**
197
- This work is supported by the National Natural Science Foundation of China (No. 62206068) and the Natural Science Foundation of Zhejiang Province, China under No. LD24F020002.
198
-
199
- Also, this project is inspired by [Video-LLaMA](https://github.com/DAMO-NLP-SG/Video-LLaMA).
 
1
+ ---
2
+ title: Hawk
3
+ emoji: 🦫
4
+ colorFrom: yellow
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 5.17.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ short_description: Anomaly Understanding.
12
+ ---
13
+
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference