microsoft
/

GUI-Actor-Verifier-2B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions Community

qianhuiwu commited on 7 days ago

Commit

cc3ad83

·

verified ·

1 Parent(s): c8a234a

update paper link.

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grou
 It is developed based on [UI-TARS-2B-SFT](https://huggingface.co/ByteDance-Seed/UI-TARS-2B-SFT) and is designed to predict the correctness of an action position given a language instruction. This model is well-suited for **GUI-Actor**, as its attention map effectively provides diverse candidates for verification with only a single inference.
-For more details on model design and evaluation, please check: [🏠 Project Page](https://aka.ms/GUI-Actor) | [💻 Github Repo](https://github.com/microsoft/GUI-Actor) | [📑 Paper]().
 | Model List                                  | Hugging Face Link                         |
@@ -194,9 +194,9 @@ answer = ground_only_positive(model, tokenizer, processor, instruction, image, p
     title={GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents},
     author={Qianhui Wu and Kanzhi Cheng and Rui Yang and Chaoyun Zhang and Jianwei Yang and Huiqiang Jiang and Jian Mu and Baolin Peng and Bo Qiao and Reuben Tan and Si Qin and Lars Liden and Qingwei Lin and Huan Zhang and Tong Zhang and Jianbing Zhang and Dongmei Zhang and Jianfeng Gao},
     year={2025},
-    eprint={},
     archivePrefix={arXiv},
     primaryClass={cs.CV},
-    url={},
 }
 ```

 It is developed based on [UI-TARS-2B-SFT](https://huggingface.co/ByteDance-Seed/UI-TARS-2B-SFT) and is designed to predict the correctness of an action position given a language instruction. This model is well-suited for **GUI-Actor**, as its attention map effectively provides diverse candidates for verification with only a single inference.
+For more details on model design and evaluation, please check: [🏠 Project Page](https://aka.ms/GUI-Actor) | [💻 Github Repo](https://github.com/microsoft/GUI-Actor) | [📑 Paper](https://www.arxiv.org/pdf/2506.03143).
 | Model List                                  | Hugging Face Link                         |
     title={GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents},
     author={Qianhui Wu and Kanzhi Cheng and Rui Yang and Chaoyun Zhang and Jianwei Yang and Huiqiang Jiang and Jian Mu and Baolin Peng and Bo Qiao and Reuben Tan and Si Qin and Lars Liden and Qingwei Lin and Huan Zhang and Tong Zhang and Jianbing Zhang and Dongmei Zhang and Jianfeng Gao},
     year={2025},
+    eprint={2506.03143},
     archivePrefix={arXiv},
     primaryClass={cs.CV},
+    url={https://www.arxiv.org/pdf/2506.03143},
 }
 ```