YuanLiuuuuuu commited on
Commit
b6bd381
·
verified ·
1 Parent(s): 8505c38

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -22
README.md CHANGED
@@ -22,6 +22,7 @@ We are delighted to announce that the WePOINTS family has welcomed a new member:
22
 
23
  ## News
24
 
 
25
  - 2025.08.26: We released the weights of the most recent version of POINT-Reader🔥🔥🔥.
26
  - 2025.08.21: POINTS-Reader is accepted by **EMNLP 2025** for presentation at the **Main Conference**🎉🎉🎉.
27
 
@@ -37,7 +38,7 @@ We are delighted to announce that the WePOINTS family has welcomed a new member:
37
 
38
  ## Results
39
 
40
- We take the following results from [OmniDocBench](https://github.com/opendatalab/OmniDocBench/tree/main) and POINTS-Reader for comparison:
41
 
42
  <table style="width: 92%; margin: auto; border-collapse: collapse;">
43
  <thead>
@@ -225,8 +226,8 @@ We take the following results from [OmniDocBench](https://github.com/opendatalab
225
  <td>0.641</td>
226
  </tr>
227
  <tr>
228
- <td rowspan="10">Expert VLMs</td>
229
- <td>POINTS-Reader-3B</td>
230
  <td>0.133</td>
231
  <td>0.212</td>
232
  <td>0.062</td>
@@ -561,7 +562,7 @@ This following code snippet has been tested with following environment:
561
  ```
562
  python==3.10.12
563
  torch==2.5.1
564
- transformers==4.46.1
565
  cuda==12.1
566
  ```
567
 
@@ -569,17 +570,8 @@ If you encounter environment issues, please feel free to open an issue.
569
 
570
  ### Run with Transformers
571
 
572
- Before you run the following code, make sure you install the `WePOINTS` package by running:
573
-
574
- ```
575
- git clone https://github.com/WePOINTS/WePOINTS.git
576
- cd WePOINTS
577
- pip install -e.
578
- ```
579
-
580
  ```python
581
- from wepoints.utils.images import Qwen2ImageProcessorForPOINTSV15
582
- from transformers import AutoModelForCausalLM, AutoTokenizer
583
  import torch
584
 
585
 
@@ -593,11 +585,11 @@ prompt = (
593
  image_path = '/path/to/your/local/image'
594
  model_path = 'tencent/POINTS-Reader'
595
  model = AutoModelForCausalLM.from_pretrained(model_path,
596
- trust_remote_code=True,
597
- torch_dtype=torch.float16,
598
- device_map='cuda')
599
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
600
- image_processor = Qwen2ImageProcessorForPOINTSV15.from_pretrained(model_path)
601
  content = [
602
  dict(type='image', image=image_path),
603
  dict(type='text', text=prompt)
@@ -629,12 +621,123 @@ If you encounter issues like repeation, please try to increase the resolution of
629
 
630
  ### Deploy with SGLang
631
 
632
- We will create a Pull Request to SGLang, please stay tuned.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
633
 
634
  ## Known Issues
635
 
636
- - **Complex Document Parsing**: POINTS-Reader can struggle with complex layouts (e.g., newspapers), often producing repeated or missing content.
637
- - **Handwritten Document Parsing**: It also has difficulty handling handwritten inputs (e.g., receipts, notes), which can lead to recognition errors or omissions.
638
  - **Multi-language Document Parsing**: POINTS-Reader currently supports only English and Chinese, limiting its effectiveness on other languages.
639
 
640
  ## Citation
@@ -645,7 +748,7 @@ If you use this model in your work, please cite the following paper:
645
  @article{points-reader,
646
  title={POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion},
647
  author={Liu, Yuan and Zhongyin Zhao and Tian, Le and Haicheng Wang and Xubing Ye and Yangxiu You and Zilin Yu and Chuhan Wu and Zhou, Xiao and Yu, Yang and Zhou, Jie},
648
- journal={},
649
  year={2025}
650
  }
651
 
 
22
 
23
  ## News
24
 
25
+ - 2025.08.27: Support deploying POINTS-Reader with SGLang💪💪💪.
26
  - 2025.08.26: We released the weights of the most recent version of POINT-Reader🔥🔥🔥.
27
  - 2025.08.21: POINTS-Reader is accepted by **EMNLP 2025** for presentation at the **Main Conference**🎉🎉🎉.
28
 
 
38
 
39
  ## Results
40
 
41
+ For comparison, we use the results reported by [OmniDocBench](https://github.com/opendatalab/OmniDocBench/tree/main) and POINTS-Reader. Compared with the version submitted to EMNLP 2025, the current release provides (1) improved performance and (2) support for Chinese documents. Both enhancements build upon the methods proposed in this paper.
42
 
43
  <table style="width: 92%; margin: auto; border-collapse: collapse;">
44
  <thead>
 
226
  <td>0.641</td>
227
  </tr>
228
  <tr>
229
+ <td rowspan="11">Expert VLMs</td>
230
+ <td><strong style="color: green;">POINTS-Reader-3B</strong></td>
231
  <td>0.133</td>
232
  <td>0.212</td>
233
  <td>0.062</td>
 
562
  ```
563
  python==3.10.12
564
  torch==2.5.1
565
+ transformers==4.55.2
566
  cuda==12.1
567
  ```
568
 
 
570
 
571
  ### Run with Transformers
572
 
 
 
 
 
 
 
 
 
573
  ```python
574
+ from transformers import AutoModelForCausalLM, AutoTokenizer, Qwen2VLImageProcessor
 
575
  import torch
576
 
577
 
 
585
  image_path = '/path/to/your/local/image'
586
  model_path = 'tencent/POINTS-Reader'
587
  model = AutoModelForCausalLM.from_pretrained(model_path,
588
+ trust_remote_code=True,
589
+ torch_dtype=torch.float16,
590
+ device_map='cuda')
591
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
592
+ image_processor = Qwen2VLImageProcessor.from_pretrained(model_path)
593
  content = [
594
  dict(type='image', image=image_path),
595
  dict(type='text', text=prompt)
 
621
 
622
  ### Deploy with SGLang
623
 
624
+ We have created a [Pull Request](https://github.com/sgl-project/sglang/pull/9651) for SGLang. You can check out this branch and install SGLang in editable mode by following the [official guide](https://docs.sglang.ai/get_started/install.html) prior to the merging of this PR.
625
+
626
+ #### How to Deploy
627
+
628
+ You can deploy POINTS-Reader with SGLang using the following command:
629
+
630
+ ```
631
+ python3 -m sglang.launch_server \
632
+ --model-path tencent/POINTS-Reader \
633
+ --tp-size 1 \
634
+ --dp-size 1 \
635
+ --chat-template points-v15-chat \
636
+ --trust-remote-code \
637
+ --port 8081
638
+ ```
639
+
640
+ #### How to Use
641
+
642
+ You can use the following code to obtain results from SGLang:
643
+
644
+ ```python
645
+
646
+ from typing import List
647
+ import requests
648
+ import json
649
+
650
+
651
+
652
+ def call_wepoints(messages: List[dict],
653
+ temperature: float = 0.0,
654
+ max_new_tokens: int = 2048,
655
+ repetition_penalty: float = 1.05,
656
+ top_p: float = 0.8,
657
+ top_k: int = 20,
658
+ do_sample: bool = True,
659
+ url: str = 'http://127.0.0.1:8081/v1/chat/completions') -> str:
660
+ """Query WePOINTS model to generate a response.
661
+
662
+ Args:
663
+ messages (List[dict]): A list of messages to be sent to WePOINTS. The
664
+ messages should be the standard OpenAI messages, like:
665
+ [
666
+ {
667
+ 'role': 'user',
668
+ 'content': [
669
+ {
670
+ 'type': 'text',
671
+ 'text': 'Please describe this image in short'
672
+ },
673
+ {
674
+ 'type': 'image_url',
675
+ 'image_url': {'url': /path/to/image.jpg}
676
+ }
677
+ ]
678
+ }
679
+ ]
680
+ temperature (float, optional): The temperature of the model.
681
+ Defaults to 0.0.
682
+ max_new_tokens (int, optional): The maximum number of new tokens to generate.
683
+ Defaults to 2048.
684
+ repetition_penalty (float, optional): The penalty for repetition.
685
+ Defaults to 1.05.
686
+ top_p (float, optional): The top-p probability threshold.
687
+ Defaults to 0.8.
688
+ top_k (int, optional): The top-k sampling vocabulary size.
689
+ Defaults to 20.
690
+ do_sample (bool, optional): Whether to use sampling or greedy decoding.
691
+ Defaults to True.
692
+ url (str, optional): The URL of the WePOINTS model.
693
+ Defaults to 'http://127.0.0.1:8081/v1/chat/completions'.
694
+
695
+ Returns:
696
+ str: The generated response from WePOINTS.
697
+ """
698
+ data = {
699
+ 'model': 'WePoints',
700
+ 'messages': messages,
701
+ 'max_new_tokens': max_new_tokens,
702
+ 'temperature': temperature,
703
+ 'repetition_penalty': repetition_penalty,
704
+ 'top_p': top_p,
705
+ 'top_k': top_k,
706
+ 'do_sample': do_sample,
707
+ }
708
+ response = requests.post(url,
709
+ json=data)
710
+ response = json.loads(response.text)
711
+ response = response['choices'][0]['message']['content']
712
+ return response
713
+
714
+ prompt = (
715
+ 'Please extract all the text from the image with the following requirements:\n'
716
+ '1. Return tables in HTML format.\n'
717
+ '2. Return all other text in Markdown format.'
718
+ )
719
+
720
+ messages = [{
721
+ 'role': 'user',
722
+ 'content': [
723
+ {
724
+ 'type': 'text',
725
+ 'text': prompt
726
+ },
727
+ {
728
+ 'type': 'image_url',
729
+ 'image_url': {'url': '/path/to/image.jpg'}
730
+ }
731
+ ]
732
+ }]
733
+ response = call_wepoints(messages)
734
+ print(response)
735
+ ```
736
 
737
  ## Known Issues
738
 
739
+ - **Complex Document Parsing**: POINTS-Reader can struggle with complex layouts (e.g., newspapers), often producing repeated or missing content.
740
+ - **Handwritten Document Parsing**: It also has difficulty handling handwritten inputs (e.g., receipts, notes), which can lead to recognition errors or omissions.
741
  - **Multi-language Document Parsing**: POINTS-Reader currently supports only English and Chinese, limiting its effectiveness on other languages.
742
 
743
  ## Citation
 
748
  @article{points-reader,
749
  title={POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion},
750
  author={Liu, Yuan and Zhongyin Zhao and Tian, Le and Haicheng Wang and Xubing Ye and Yangxiu You and Zilin Yu and Chuhan Wu and Zhou, Xiao and Yu, Yang and Zhou, Jie},
751
+ journal={EMNLP2025},
752
  year={2025}
753
  }
754