Upload MODEL_CARD.md with huggingface_hub
Browse files- MODEL_CARD.md +2 -13
MODEL_CARD.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
# Boolean Search Query
|
2 |
|
3 |
This model is fine-tuned to convert natural language queries into boolean search expressions, optimized for academic and research database searching.
|
4 |
|
@@ -61,7 +61,7 @@ Fine-tuned: "artificial intelligence" AND (ethics OR regulation OR policy) # Pr
|
|
61 |
|
62 |
The model was trained on a curated dataset of natural language queries paired with their correct boolean translations. Dataset characteristics:
|
63 |
|
64 |
-
- Size:
|
65 |
- Format: Natural query → Boolean expression pairs
|
66 |
- Source: Manually curated academic search examples
|
67 |
- Validation: Expert-reviewed for accuracy
|
@@ -69,9 +69,6 @@ The model was trained on a curated dataset of natural language queries paired wi
|
|
69 |
## Training Process
|
70 |
|
71 |
- **Method**: LoRA fine-tuning
|
72 |
-
- **Epochs**: 6
|
73 |
-
- **Learning Rate**: 5e-5 with cosine scheduling
|
74 |
-
- **Batch Size**: 16 (4 per device × 4 gradient accumulation steps)
|
75 |
- **Hardware**: NVIDIA GeForce RTX 4070 Ti SUPER
|
76 |
|
77 |
## How to Use
|
@@ -150,14 +147,6 @@ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
150 |
print(result) # "climate change" AND "renewable energy"
|
151 |
```
|
152 |
|
153 |
-
## Evaluation Results
|
154 |
-
|
155 |
-
Our test suite demonstrates consistent improvements over the base model in key areas:
|
156 |
-
1. Meta-term removal accuracy: 100%
|
157 |
-
2. Proper multi-word term quoting: 95%
|
158 |
-
3. Logical grouping accuracy: 98%
|
159 |
-
4. Minimal formatting adherence: 97%
|
160 |
-
|
161 |
## Citation
|
162 |
|
163 |
If you use this model in your research, please cite:
|
|
|
1 |
+
# Boolean Search Query Model
|
2 |
|
3 |
This model is fine-tuned to convert natural language queries into boolean search expressions, optimized for academic and research database searching.
|
4 |
|
|
|
61 |
|
62 |
The model was trained on a curated dataset of natural language queries paired with their correct boolean translations. Dataset characteristics:
|
63 |
|
64 |
+
- Size: 135 examples
|
65 |
- Format: Natural query → Boolean expression pairs
|
66 |
- Source: Manually curated academic search examples
|
67 |
- Validation: Expert-reviewed for accuracy
|
|
|
69 |
## Training Process
|
70 |
|
71 |
- **Method**: LoRA fine-tuning
|
|
|
|
|
|
|
72 |
- **Hardware**: NVIDIA GeForce RTX 4070 Ti SUPER
|
73 |
|
74 |
## How to Use
|
|
|
147 |
print(result) # "climate change" AND "renewable energy"
|
148 |
```
|
149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
150 |
## Citation
|
151 |
|
152 |
If you use this model in your research, please cite:
|