|
# LayoutLMv1 Quantized Model for Form Understanding (FUNSD) |
|
|
|
This repository hosts a quantized version of the LayoutLMv1 model, fine-tuned for document information extraction on the FUNSD dataset. The model is optimized for extracting entities such as questions, answers, and headers from scanned forms while maintaining efficient performance for deployment in production environments. |
|
|
|
## Model Details |
|
|
|
- **Model Architecture:** LayoutLMv1 (Hugging Face Transformers) |
|
- **Task:** Token Classification (Form Understanding / Information Extraction) |
|
- **Dataset:** [nielsr/funsd](https://huggingface.co/datasets/nielsr/funsd) |
|
- **Quantization:** Dynamic Quantization (INT8 for Linear layers) |
|
- **Fine-tuning Framework:** PyTorch & Hugging Face Transformers |
|
|
|
--- |
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
```sh |
|
pip install transformers datasets torch torchvision |
|
``` |
|
|
|
### Loading the Quantized Model |
|
|
|
```python |
|
from transformers import LayoutLMForTokenClassification, LayoutLMTokenizer |
|
import torch |
|
|
|
# Load quantized model |
|
model = LayoutLMForTokenClassification.from_pretrained("saved_model_quantized/") |
|
model.load_state_dict(torch.load("saved_model_quantized/pytorch_model.bin")) |
|
model.eval() |
|
|
|
# Load tokenizer |
|
tokenizer = LayoutLMTokenizer.from_pretrained("microsoft/layoutlm-base-uncased") |
|
``` |
|
|
|
--- |
|
|
|
## Performance Metrics |
|
|
|
| Epoch | Training Loss | Validation Loss | |
|
|-------|----------------|------------------| |
|
| 0 | 73.67 | 22.15 | |
|
| 1 | 37.13 | 15.65 | |
|
| 2 | 26.30 | 16.39 | |
|
| 3 | 17.68 | 14.45 | |
|
| 4 | 11.12 | 17.16 | |
|
|
|
> π The model steadily improves until Epoch 3, indicating that the best checkpoint may be around this point. |
|
|
|
--- |
|
|
|
## Fine-Tuning Details |
|
|
|
### Dataset |
|
|
|
- **Name:** `nielsr/funsd` |
|
- **Description:** A benchmark dataset for form understanding tasks with manually labeled fields like questions, answers, headers, etc. |
|
|
|
### Training Configuration |
|
|
|
- **Epochs:** 5 |
|
- **Batch Size:** As per available memory |
|
- **Learning Rate:** 4e-5 |
|
- **Optimizer:** AdamW |
|
- **Loss Function:** CrossEntropyLoss (token classification) |
|
|
|
--- |
|
|
|
## Quantization |
|
|
|
Dynamic post-training quantization was applied using PyTorch: |
|
|
|
```python |
|
import torch |
|
quantized_model = torch.quantization.quantize_dynamic( |
|
model_fp32, |
|
{torch.nn.Linear}, |
|
dtype=torch.qint8 |
|
) |
|
torch.save(quantized_model.state_dict(), "saved_model_quantized/pytorch_model.bin") |
|
``` |
|
|
|
--- |
|
|
|
## Repository Structure |
|
|
|
``` |
|
. |
|
βββ saved_model/ # Original fine-tuned LayoutLMv1 |
|
βββ saved_model_quantized/ # INT8 quantized model files |
|
β βββ config.json |
|
β βββ pytorch_model.bin |
|
βββ README.md # Project documentation |
|
``` |
|
|
|
--- |
|
|
|
## Limitations |
|
|
|
- Trained only on the FUNSD dataset; may not generalize to other document formats. |
|
- Quantized version may have slight degradation in performance compared to the full-precision model. |
|
- Requires OCR and bounding boxes as input for proper operation. |
|
|
|
--- |
|
|
|
## Contributing |
|
|
|
Contributions are welcome! If you have suggestions, improvements, or find bugs, feel free to open an issue or submit a pull request. |
|
|