nielsr HF Staff commited on
Commit
df12433
·
verified ·
1 Parent(s): d89bcaa

Improve model card with paper link and clarify metadata

Browse files

This PR improves the model card by:

- Adding a link to the paper on the Hugging Face website in the introduction.
- Correcting the metadata section to ensure consistency.


This addresses several issues to improve the overall clarity and completeness of the model card.

Files changed (1) hide show
  1. README.md +25 -217
README.md CHANGED
@@ -1,7 +1,13 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - remyxai/OpenSpaces
 
 
 
 
 
5
  tags:
6
  - remyx
7
  - vqasynth
@@ -13,13 +19,7 @@ tags:
13
  - distance-estimation
14
  - embodied-ai
15
  - quantitative-spatial-reasoning
16
- base_model:
17
- - Qwen/Qwen2.5-VL-3B-Instruct
18
- language:
19
- - en
20
- pipeline_tag: image-text-to-text
21
  new_version: remyxai/SpaceThinker-Qwen2.5VL-3B
22
- library_name: transformers
23
  model-index:
24
  - name: SpaceQwen2.5-VL-3B-Instruct
25
  results:
@@ -31,244 +31,52 @@ model-index:
31
  type: benchmark
32
  metrics:
33
  - type: success_rate
 
34
  name: Overall Success Rate
35
- value: 0.5150
36
- results_by_subcategory:
37
- - name: 3D Positional Relation / Orientation
38
- success_rate: 0.4706
39
- - name: Object Localization / 3D Localization
40
- success_rate: 0.5629
41
- - name: Object Properties / Size
42
- success_rate: 0.5116
43
-
44
- - task:
45
- type: visual-question-answering
46
- name: Spatial Reasoning
47
- dataset:
48
- name: BLINK
49
- type: benchmark
50
- metrics:
51
  - type: success_rate
 
52
  name: Overall Success Rate
53
- value: 0.5000
54
- results_by_subcategory:
55
- - name: 3D Positional Relation / Orientation
56
- success_rate: 0.6503
57
- - name: Counting / Object Counting
58
- success_rate: 0.6083
59
- - name: Depth and Distance / Relative
60
- success_rate: 0.5161
61
- - name: Object Localization / 2D Localization
62
- success_rate: 0.4426
63
- - name: Point and Object Tracking / Point Correspondence
64
- success_rate: 0.2849
65
-
66
- - task:
67
- type: visual-question-answering
68
- name: Spatial Reasoning
69
- dataset:
70
- name: MMIU
71
- type: benchmark
72
- metrics:
73
  - type: success_rate
74
- name: Overall Success Rate
75
  value: 0.3045
76
- results_by_subcategory:
77
- - name: Camera and Image Transformation / 2D Transformation
78
- success_rate: 0.245
79
- - name: Camera and Image Transformation / 3D Camera Pose
80
- success_rate: 0.215
81
- - name: Camera and Image Transformation / Camera Motion
82
- success_rate: 0.4436
83
- - name: Depth and Distance / Absolute
84
- success_rate: 0.265
85
- - name: Object Localization / 3D Localization
86
- success_rate: 0.480
87
- - name: Point and Object Tracking / 3D Tracking
88
- success_rate: 0.240
89
- - name: Point and Object Tracking / Point Correspondence
90
- success_rate: 0.280
91
-
92
- - task:
93
- type: visual-question-answering
94
- name: Spatial Reasoning
95
- dataset:
96
- name: MMVP
97
- type: benchmark
98
- metrics:
99
- - type: success_rate
100
  name: Overall Success Rate
101
- value: 0.5767
102
- results_by_subcategory:
103
- - name: Others / Miscellaneous
104
- success_rate: 0.5767
105
-
106
- - task:
107
- type: visual-question-answering
108
- name: Spatial Reasoning
109
- dataset:
110
- name: QSpatialBench-Plus
111
- type: benchmark
112
- metrics:
113
  - type: success_rate
 
114
  name: Overall Success Rate
115
- value: 0.3663
116
- results_by_subcategory:
117
- - name: Depth and Distance / Absolute
118
- success_rate: 0.3663
119
-
120
- - task:
121
- type: visual-question-answering
122
- name: Spatial Reasoning
123
- dataset:
124
- name: QSpatialBench-ScanNet
125
- type: benchmark
126
- metrics:
127
  - type: success_rate
 
128
  name: Overall Success Rate
129
- value: 0.3300
130
- results_by_subcategory:
131
- - name: Depth and Distance / Absolute
132
- success_rate: 0.2160
133
- - name: Object Properties / Size
134
- success_rate: 0.4444
135
-
136
- - task:
137
- type: visual-question-answering
138
- name: Spatial Reasoning
139
- dataset:
140
- name: RealWorldQA
141
- type: benchmark
142
- metrics:
143
  - type: success_rate
 
144
  name: Overall Success Rate
145
- value: 0.4392
146
- results_by_subcategory:
147
- - name: Others / Miscellaneous
148
- success_rate: 0.4392
149
-
150
- - task:
151
- type: visual-question-answering
152
- name: Spatial Reasoning
153
- dataset:
154
- name: SpatialSense
155
- type: benchmark
156
- metrics:
157
  - type: success_rate
 
158
  name: Overall Success Rate
159
- value: 0.6554
160
- results_by_subcategory:
161
- - name: 3D Positional Relation / Orientation
162
- success_rate: 0.6554
163
-
164
- - task:
165
- type: visual-question-answering
166
- name: Spatial Reasoning
167
- dataset:
168
- name: VGBench
169
- type: benchmark
170
- metrics:
171
  - type: success_rate
 
172
  name: Overall Success Rate
173
- value: 0.2615
174
- results_by_subcategory:
175
- - name: Camera and Image Transformation / 2D Transformation
176
- success_rate: 0.2277
177
- - name: Camera and Image Transformation / 3D Camera Pose
178
- success_rate: 0.2438
179
- - name: Depth and Distance / Absolute
180
- success_rate: 0.2696
181
- - name: Depth and Distance / Relative
182
- success_rate: 0.1945
183
- - name: Object Localization / 3D Localization
184
- success_rate: 0.3733
185
- - name: Point and Object Tracking / 3D Tracking
186
- success_rate: 0.2655
187
-
188
- - task:
189
- type: visual-question-answering
190
- name: Spatial Reasoning
191
- dataset:
192
- name: VSI-Bench_8
193
- type: benchmark
194
- metrics:
195
  - type: success_rate
 
196
  name: Overall Success Rate
197
- value: 0.2322
198
- results_by_subcategory:
199
- - name: 3D Positional Relation / Orientation
200
- success_rate: 0.3843
201
- - name: Counting / Object Counting
202
- success_rate: 0.1715
203
- - name: Depth and Distance / Absolute
204
- success_rate: 0.0299
205
- - name: Depth and Distance / Relative
206
- success_rate: 0.3521
207
- - name: Object Properties / Size
208
- success_rate: 0.2323
209
- - name: Others / Miscellaneous
210
- success_rate: 0.2525
211
-
212
- - task:
213
- type: visual-question-answering
214
- name: Spatial Reasoning
215
- dataset:
216
- name: VSR-ZeroShot
217
- type: benchmark
218
- metrics:
219
  - type: success_rate
 
220
  name: Overall Success Rate
221
- value: 0.7373
222
- results_by_subcategory:
223
- - name: 3D Positional Relation / Orientation
224
- success_rate: 0.7373
225
-
226
- - task:
227
- type: visual-question-answering
228
- name: Spatial Reasoning
229
- dataset:
230
- name: cvbench
231
- type: benchmark
232
- metrics:
233
  - type: success_rate
 
234
  name: Overall Success Rate
235
- value: 0.5179
236
- results_by_subcategory:
237
- - name: Counting / Object Counting
238
- success_rate: 0.6168
239
- - name: Depth and Distance / Relative
240
- success_rate: 0.4925
241
- - name: Object Localization / 3D Localization
242
- success_rate: 0.4446
243
-
244
- - task:
245
- type: visual-question-answering
246
- name: Spatial Reasoning
247
- dataset:
248
- name: spatialbench
249
- type: benchmark
250
- metrics:
251
  - type: success_rate
 
252
  name: Overall Success Rate
 
253
  value: 0.4879
254
- results_by_subcategory:
255
- - name: 3D Positional Relation / Orientation
256
- success_rate: 0.5294
257
- - name: Counting / Object Counting
258
- success_rate: 0.7000
259
- - name: Object Properties / Existence
260
- success_rate: 0.4500
261
- - name: Object Properties / Reachability
262
- success_rate: 0.5000
263
- - name: Object Properties / Size
264
- success_rate: 0.2500
265
-
266
  ---
267
 
268
  <img src="https://cdn-uploads.huggingface.co/production/uploads/647777304ae93470ffc28913/v4edJliSy46xBA8g5ZXf8.png" width="500"/>
269
 
270
  # SpaceQwen2.5-VL-3B-Instruct
271
 
 
 
272
 
273
  - **Model Type:** Multimodal, Vision-Language Model
274
  - **Architecture**: `Qwen2.5-VL-3B-Instruct`
@@ -279,8 +87,8 @@ model-index:
279
 
280
  ### Model Overview
281
 
282
- This model uses data synthesis techniques and publically available models to reproduce the work described in SpatialVLM to enhance the spatial reasoning of multimodal models.
283
- With a pipeline of expert models, we can infer spatial relationships between objects in a scene to create VQA dataset for spatial reasoning.
284
 
285
 
286
  ## Running SpaceQwen2.5-VL-3B-Instruct
@@ -389,7 +197,7 @@ The following chart compares performance between **SpaceQwen** and **SpaceThinke
389
 
390
  ## OmniSpatial
391
 
392
- **OmniSpatial** is another comprehensive spatial reasoning benchmark assesses dynamic reasoning, complex spatial logic, spatial interaction, and perspective-taking capabilities.
393
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/647777304ae93470ffc28913/EDHmFRztyTI-lhdgEYZzP.png)
394
 
395
  Learn more about [OmniSpatial](https://qizekun.github.io/omnispatial/).
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-VL-3B-Instruct
4
  datasets:
5
  - remyxai/OpenSpaces
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ license: apache-2.0
10
+ pipeline_tag: image-text-to-text
11
  tags:
12
  - remyx
13
  - vqasynth
 
19
  - distance-estimation
20
  - embodied-ai
21
  - quantitative-spatial-reasoning
 
 
 
 
 
22
  new_version: remyxai/SpaceThinker-Qwen2.5VL-3B
 
23
  model-index:
24
  - name: SpaceQwen2.5-VL-3B-Instruct
25
  results:
 
31
  type: benchmark
32
  metrics:
33
  - type: success_rate
34
+ value: 0.515
35
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  - type: success_rate
37
+ value: 0.5
38
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  - type: success_rate
 
40
  value: 0.3045
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
42
  - type: success_rate
43
+ value: 0.5767
44
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
45
  - type: success_rate
46
+ value: 0.3663
47
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  - type: success_rate
49
+ value: 0.33
50
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
51
  - type: success_rate
52
+ value: 0.4392
53
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
54
  - type: success_rate
55
+ value: 0.6554
56
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  - type: success_rate
58
+ value: 0.2615
59
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  - type: success_rate
61
+ value: 0.2322
62
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
63
  - type: success_rate
64
+ value: 0.7373
65
  name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  - type: success_rate
67
+ value: 0.5179
68
  name: Overall Success Rate
69
+ - type: success_rate
70
  value: 0.4879
71
+ name: Overall Success Rate
 
 
 
 
 
 
 
 
 
 
 
72
  ---
73
 
74
  <img src="https://cdn-uploads.huggingface.co/production/uploads/647777304ae93470ffc28913/v4edJliSy46xBA8g5ZXf8.png" width="500"/>
75
 
76
  # SpaceQwen2.5-VL-3B-Instruct
77
 
78
+ The model was presented in the paper [OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models](https://huggingface.co/papers/2506.03135). More information can be found at the [project page](https://qizekun.github.io/omnispatial/).
79
+
80
 
81
  - **Model Type:** Multimodal, Vision-Language Model
82
  - **Architecture**: `Qwen2.5-VL-3B-Instruct`
 
87
 
88
  ### Model Overview
89
 
90
+ This model uses data synthesis techniques and publicly available models to reproduce the work described in SpatialVLM to enhance the spatial reasoning of multimodal models.
91
+ With a pipeline of expert models, we can infer spatial relationships between objects in a scene to create a VQA dataset for spatial reasoning.
92
 
93
 
94
  ## Running SpaceQwen2.5-VL-3B-Instruct
 
197
 
198
  ## OmniSpatial
199
 
200
+ **OmniSpatial** is another comprehensive spatial reasoning benchmark that assesses dynamic reasoning, complex spatial logic, spatial interaction, and perspective-taking capabilities.
201
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/647777304ae93470ffc28913/EDHmFRztyTI-lhdgEYZzP.png)
202
 
203
  Learn more about [OmniSpatial](https://qizekun.github.io/omnispatial/).