Spaces:
Running
Running
Commit
·
2eac01a
1
Parent(s):
1c43e67
Revise documentation in app.py for Deep‑Research PDF Field Extractor, enhancing clarity on system architecture, core components, processing pipeline, and key features. Update usage instructions and support resources for better user guidance.
Browse files
src/app.py
CHANGED
@@ -62,57 +62,104 @@ page = st.sidebar.radio("Go to", ["Documentation", "Traces", "Execution"])
|
|
62 |
|
63 |
# Documentation Page
|
64 |
if page == "Documentation":
|
65 |
-
st.title("Deep‑Research PDF Field Extractor
|
66 |
|
67 |
st.markdown("""
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
1. Page-by-page scanning for precise extraction
|
96 |
2. Semantic search fallback if no value found
|
97 |
-
-
|
98 |
-
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
-
|
103 |
-
-
|
104 |
-
-
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
-
|
109 |
-
-
|
110 |
-
-
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
-
|
115 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
""")
|
117 |
|
118 |
# Traces Page
|
|
|
62 |
|
63 |
# Documentation Page
|
64 |
if page == "Documentation":
|
65 |
+
st.title("Deep‑Research PDF Field Extractor")
|
66 |
|
67 |
st.markdown("""
|
68 |
+
## Overview
|
69 |
+
This system uses a multi-agent architecture to extract fields from PDFs with high accuracy and reliability.
|
70 |
+
|
71 |
+
### Core Components
|
72 |
+
|
73 |
+
1. **Planner**
|
74 |
+
- Generates execution plans using Azure OpenAI
|
75 |
+
- Determines optimal extraction strategy
|
76 |
+
- Manages task dependencies
|
77 |
+
|
78 |
+
2. **Executor**
|
79 |
+
- Executes the generated plan
|
80 |
+
- Manages agent execution flow
|
81 |
+
- Handles context and result management
|
82 |
+
|
83 |
+
3. **Agents**
|
84 |
+
- `TableAgent`: Extracts text and tables using Azure Document Intelligence
|
85 |
+
- `FieldMapper`: Maps fields to values using extracted content
|
86 |
+
- `ForEachField`: Controls field iteration flow
|
87 |
+
|
88 |
+
### Processing Pipeline
|
89 |
+
|
90 |
+
1. **Document Processing**
|
91 |
+
- Text and table extraction using Azure Document Intelligence
|
92 |
+
- Layout and structure preservation
|
93 |
+
- Support for complex document formats
|
94 |
+
|
95 |
+
2. **Field Extraction**
|
96 |
+
- Document type inference
|
97 |
+
- User profile determination
|
98 |
+
- Page-by-page scanning
|
99 |
+
- Value extraction and validation
|
100 |
+
|
101 |
+
3. **Context Building**
|
102 |
+
- Document metadata
|
103 |
+
- Field descriptions
|
104 |
+
- User context
|
105 |
+
- Execution history
|
106 |
+
|
107 |
+
### Key Features
|
108 |
+
|
109 |
+
#### Smart Field Extraction
|
110 |
+
- Two-step extraction strategy:
|
111 |
1. Page-by-page scanning for precise extraction
|
112 |
2. Semantic search fallback if no value found
|
113 |
+
- Basic context awareness for improved extraction
|
114 |
+
- Support for tabular data extraction
|
115 |
+
|
116 |
+
#### Document Intelligence
|
117 |
+
- Azure Document Intelligence integration
|
118 |
+
- Layout and structure preservation
|
119 |
+
- Table extraction and formatting
|
120 |
+
- Complex document handling
|
121 |
+
|
122 |
+
#### Execution Monitoring
|
123 |
+
- Detailed execution traces
|
124 |
+
- Success/failure status
|
125 |
+
- Comprehensive logging
|
126 |
+
- Result storage and retrieval
|
127 |
+
|
128 |
+
### Technical Requirements
|
129 |
+
|
130 |
+
- Azure OpenAI API key
|
131 |
+
- Azure Document Intelligence endpoint
|
132 |
+
- Python 3.9 or higher
|
133 |
+
- Required Python packages (see requirements.txt)
|
134 |
+
|
135 |
+
### Getting Started
|
136 |
+
|
137 |
+
1. **Upload Your PDF**
|
138 |
+
- Click the "Upload PDF" button
|
139 |
+
- Select your PDF file
|
140 |
+
|
141 |
+
2. **Specify Fields**
|
142 |
+
- Enter comma-separated field names
|
143 |
+
- Example: `Date, Name, Value, Location`
|
144 |
+
|
145 |
+
3. **Optional: Add Field Descriptions**
|
146 |
+
- Provide YAML-formatted field descriptions
|
147 |
+
- Helps improve extraction accuracy
|
148 |
+
|
149 |
+
4. **Run Extraction**
|
150 |
+
- Click "Run extraction"
|
151 |
+
- Monitor progress in execution trace
|
152 |
+
- View results in table format
|
153 |
+
|
154 |
+
5. **Download Results**
|
155 |
+
- Export as CSV
|
156 |
+
- View detailed execution logs
|
157 |
+
|
158 |
+
### Support
|
159 |
+
|
160 |
+
For detailed technical documentation, please refer to:
|
161 |
+
- [Architecture Overview](ARCHITECTURE.md)
|
162 |
+
- [Developer Documentation](DEVELOPER.md)
|
163 |
""")
|
164 |
|
165 |
# Traces Page
|
src/config/__pycache__/settings.cpython-312.pyc
CHANGED
Binary files a/src/config/__pycache__/settings.cpython-312.pyc and b/src/config/__pycache__/settings.cpython-312.pyc differ
|
|
src/services/__pycache__/llm_client.cpython-312.pyc
CHANGED
Binary files a/src/services/__pycache__/llm_client.cpython-312.pyc and b/src/services/__pycache__/llm_client.cpython-312.pyc differ
|
|