qianliyx commited on
Commit
7f09969
·
verified ·
1 Parent(s): afb7420

Upload 2 files

Browse files
Files changed (2) hide show
  1. README_en.md +166 -0
  2. requirements.txt +8 -0
README_en.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AnyOCR
2
+
3
+ ```
4
+ ___ ____ __________
5
+ / | ____ __ __/ __ \/ ____/ __ \
6
+ / /| | / __ \/ / / / / / / / / /_/ /
7
+ / ___ |/ / / / /_/ / /_/ / /___/ _, _/
8
+ /_/ |_/_/ /_/\__, /\____/\____/_/ |_|
9
+ /____/
10
+
11
+ ```
12
+
13
+ English | [简体中文](./README.md)
14
+
15
+ ## 1. Introduction
16
+
17
+ At present, we are very pleased to launch the Onnx format OCR tool 'AnyOCR' that is compatible with multiple platforms. Its core highlight is the use of ONNXRime as the inference engine, which ensures efficient and stable operation compared to PaddlePaddle inference engine.
18
+
19
+ - github地址:[AnyOCR](https://github.com/oriforge/anyocr)
20
+
21
+ ## 2. Origin
22
+
23
+ The PaddlePaddle team has implemented an OCR tool based on PaddlePaddle in the PaddleOCR project, which has powerful performance and functionality. However, in certain scenarios, there are some issues with the speed and stability of the PaddlePaddle inference engine. So we collected a lot of new OCR data to fine tune and optimize PaddleOCR, and exported it in onnx format, directly using onnx runtime inference, avoiding the pitfalls of PaddlePaddle inference engine, and supporting CPU, GPU, etc.
24
+
25
+ PaddleOCR does not perform well on some new types of data or domain data, so we collected a lot of data for fine-tuning training, covering various fields, including:
26
+
27
+ - cc-ocr
28
+ - Industrial
29
+ - Medical treatment
30
+ - Physical examination
31
+ - Chinese
32
+ - English
33
+ - Paper
34
+ - Network
35
+ - Self built
36
+ - ETC.
37
+
38
+ Total dataset: greater than `385K`。
39
+
40
+ ### Extended training
41
+
42
+ - train datasets:`385K`
43
+ - test datasets:`5k`
44
+ - ACC:`0.952`
45
+
46
+ ### Model introduction
47
+
48
+ - Detection model: `anyocr_det_ch_v4_lite.onnx`, fine tuned and trained on our dataset by `ch_PP-OCRv4_det`.
49
+ - Recognition model: `anyocr_rec_v4_server.onnx`, fine tuned and trained on our dataset by `ch_PP-OCRv4_server_rec`.
50
+ - Direction classification: `anyocr_cls_v4.onnx`, sourced from `ch_ppocr_mobile_v2.0_cls` without training.
51
+ - Text character: `anyocr_keys_v4.txt`, derived from `ppocr/utils/ppocr_keys_v1.txt`.
52
+
53
+
54
+
55
+ ### evaluation
56
+
57
+ Self built evaluation datasets:`1.1K`
58
+
59
+ Extract 1150 pairs of untrained data for evaluation, covering Chinese, English, numbers, symbols, etc.
60
+
61
+ Our evaluation set and other OCR accuracy testing evaluations:
62
+
63
+ - Our anyocr: 0.97
64
+ - Paddleocr:0.92
65
+ - Ali duguang ocr:0.86
66
+ - GOT_OCR2.0:0.89
67
+ - Olm-ocr: 0.46
68
+
69
+ ## 3. Usage
70
+
71
+ ### Install dependencies
72
+
73
+ ```bash
74
+ pip install -r requirements.txt
75
+ ```
76
+
77
+ ### Method of use
78
+
79
+ ```python
80
+ ## simple
81
+ # use_det = True or False, if text detection
82
+ # use_cls = True or False, if text orientation cls
83
+ # use_rec = True or False, if text recognition
84
+
85
+ from anyocr.pipeline import anyocr
86
+
87
+ model = anyocr()
88
+
89
+ res = model.raw_completions('/to/your/image',use_cls=True,use_det=True)
90
+
91
+ print(res)
92
+
93
+
94
+ ### custom model
95
+ from anyocr.pipeline import anyocr
96
+ from anyocr.pipeline import anyocrConfig
97
+
98
+
99
+ config = anyocrConfig(
100
+ det_model_path = "anyocr/models/anyocr_det_ch_v4_lite.onnx",
101
+ rec_model_path = "anyocr/models/anyocr_rec_v4_server.onnx",
102
+ cls_model_path = "anyocr/models/anyocr_cls_v4.onnx",
103
+ rec_keys_path = "anyocr/models/anyocr_keys_v4.txt"
104
+ )
105
+ config = config.model_dump()
106
+ model = anyocr(config)
107
+
108
+ res = model.raw_completions('/to/your/image',use_cls=True,use_det=True)
109
+
110
+ print(res)
111
+ ```
112
+
113
+ - If you have better text detection, text recognition can also use only a part of ours.
114
+ - You can also export the PaddleOCR model to onnx format and use AnyOCR inference, or you can fine tune the PaddleOCR model yourself and use AnyOCR inference.
115
+
116
+
117
+ ### Configuration
118
+
119
+ ```python
120
+ from pydantic import BaseModel
121
+
122
+ class anyocrConfig(BaseModel):
123
+ text_score: float = 0.5 # Confidence level of text recognition results, range of values:[0, 1]
124
+ use_det: bool = True # if text detection
125
+ use_cls: bool = True # if text orientation cls
126
+ use_rec: bool = True # if text recognition
127
+ print_verbose: bool = False # verbose
128
+ min_height: int = 30 # The minimum height of the image (in pixels), below which the text detection stage will be skipped and subsequent recognition will be carried out directly.
129
+ width_height_ratio: float = 8 # If the aspect ratio of the input image is greater than width_height_ratio, text detection will be skipped and subsequent recognition will be performed directly
130
+ max_side_len: int = 2000 # If the maximum edge of the input image is greater than max_side_len, the maximum edge will be reduced to max_side_len according to aspect ratio
131
+ min_side_len: int = 30 # If the minimum edge of the input image is smaller than min_side_len, the minimum edge will be scaled to min_side_len according to aspect ratio
132
+ return_word_box: bool = False # Whether to return the single character coordinates of the text
133
+
134
+ det_use_cuda: bool = False # if use gpu
135
+ det_model_path: Optional[str] = None # text detection model path
136
+ det_limit_side_len: float = 736 # Pixel values that limit the length of image edges
137
+ det_limit_type: str = "min" # Limit the minimum or maximum edge length of the image to det_limit_side_len, with a value range of:[min, max]
138
+ det_max_candidates:int = 1000 # Maximum number of candidate boxes
139
+ det_thresh: float = 0.3 # The segmentation threshold for the text and background parts in the image. The larger the value, the smaller the text part will be. Value range:[0, 1]
140
+ det_box_thresh: float = 0.5 # The threshold for whether the box obtained from text detection is retained, the larger the value, the lower the recall rate. Value range:[0, 1]
141
+ det_unclip_ratio: float = 1.6 # Control the size of the text detection box, the larger the value, the larger the overall detection box. Value range:[1.6, 2.0]
142
+ det_donot_use_dilation: bool = False # Do you want to use dilation? This parameter is used to perform morphological dilation on the detected text area
143
+ det_score_mode: str = "slow" # The method for calculating the score of a text box. The range of values is:[slow, fast]
144
+
145
+ cls_use_cuda: bool = False # if use gpu
146
+ cls_model_path: Optional[str] = None # text orientation cls model path
147
+ cls_image_shape: List[int] = [3, 48, 192] # Image shape of input direction classification model (CHW)
148
+ cls_label_list: List[str] = ["0", "180"] # The label for directional classification, 0 ° or 180 °, cannot be changed
149
+ cls_batch_num: int = 6 # The batch size for batch inference is generally set to the default value. If it is too large, it may not significantly speed up the process and may result in poor performance. The default value is 6.
150
+ cls_thresh: float = 0.9 # Confidence level of directional classification results. Value range:[0, 1]
151
+
152
+ rec_use_cuda: bool = False # if use gpu
153
+ rec_keys_path: Optional[str] = None # text recognition cahr file path
154
+ rec_model_path: Optional[str] = None # text recognition model path
155
+ rec_img_shape: List[int] = [3, 48, 320] # Image shape of input direction recognition model (CHW)
156
+ rec_batch_num: int = 6 # The batch size for batch inference is generally set to the default value. If it is too large, it may not significantly speed up the process and may result in poor performance. The default value is 6.
157
+
158
+ ```
159
+
160
+ ## Special Thanks
161
+ - `paddleocr` Provide original models and fine-tuning tutorials
162
+ - Most of the source code comes from `RapidOCR`,I have made some personal changes
163
+
164
+ ## Star History
165
+
166
+ [![Star History Chart](https://api.star-history.com/svg?repos=oriforge/anyocr&type=Date)](https://www.star-history.com/#oriforge/anyocr&Date)
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ pyclipper>=1.2.0
2
+ numpy>=1.19.5,<3.0.0
3
+ six>=1.15.0
4
+ Shapely>=1.7.1,!=2.0.4 # python3.12 2.0.4 bug
5
+ onnxruntime>=1.7.0
6
+ PyYAML
7
+ Pillow
8
+ opencv_python