boolean-search-model / MODEL_CARD.md
Zwounds's picture
Upload MODEL_CARD.md with huggingface_hub
5043f4a verified
# Boolean Search Query Model
This model is fine-tuned to convert natural language queries into boolean search expressions, optimized for academic and research database searching.
## Model Details
- **Base Model**: Meta-Llama-3.1-8B
- **Training Type**: LoRA fine-tuning
- **Task**: Converting natural language to boolean search queries
- **Languages**: English
- **License**: Same as base model
## Intended Use
- Converting natural language search requests into proper boolean expressions
- Academic and research database searching
- Information retrieval query formulation
## Performance
### Test Results
Base Model vs Fine-tuned Model comparison:
```
Natural Query: "Studies examining the relationship between exercise and mental health"
Base: exercise AND mental health
Fine-tuned: exercise AND "mental health" # Properly handles multi-word terms
Natural Query: "Articles about artificial intelligence ethics and regulation or policy"
Base: "artificial intelligence ethics" AND ("regulation" OR "policy") # Treats AI ethics as one concept
Fine-tuned: "artificial intelligence" AND (ethics OR regulation OR policy) # Properly splits concepts
```
### Key Improvements
1. Meta-term Removal
- Automatically removes terms like "articles", "papers", "research", "studies"
- Focuses on actual search concepts
2. Proper Term Quoting
- Only quotes multi-word phrases
- Single words remain unquoted
3. Logical Grouping
- Appropriate use of parentheses for OR groups
- Clear operator precedence
4. Minimal Formatting
- No unnecessary parentheses
- No duplicate terms
## Limitations
- English language only
- May not handle specialized domain terminology optimally
- Limited to boolean operators (AND, OR, NOT)
- Designed for academic/research context
## Training Data
The model was trained on a curated dataset of natural language queries paired with their correct boolean translations. Dataset characteristics:
- Size: 135 examples
- Format: Natural query β†’ Boolean expression pairs
- Source: Manually curated academic search examples
- Validation: Expert-reviewed for accuracy
## Training Process
- **Method**: LoRA fine-tuning
- **Hardware**: NVIDIA GeForce RTX 4070 Ti SUPER
## How to Use
```python
from unsloth import FastLanguageModel
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
"Zwounds/boolean-search-model",
max_seq_length=2048,
dtype=None, # Auto-detect
load_in_4bit=True
)
FastLanguageModel.for_inference(model)
# Format query
query = "Find papers about climate change and renewable energy"
formatted = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
Convert this natural language query into a boolean search query by following these rules:
1. FIRST: Remove all meta-terms from this list (they should NEVER appear in output):
- articles, papers, research, studies
- examining, investigating, analyzing
- findings, documents, literature
- publications, journals, reviews
Example: "Research examining X" β†’ just "X"
2. SECOND: Remove generic implied terms that don't add search value:
- Remove words like "practices," "techniques," "methods," "approaches," "strategies"
- Remove words like "impacts," "effects," "influences," "role," "applications"
- For example: "sustainable agriculture practices" β†’ "sustainable agriculture"
- For example: "teaching methodologies" β†’ "teaching"
- For example: "leadership styles" β†’ "leadership"
3. THEN: Format the remaining terms:
CRITICAL QUOTING RULES:
- Multi-word phrases MUST ALWAYS be in quotes - NO EXCEPTIONS
- Examples of correct quoting:
- Wrong: machine learning AND deep learning
- Right: "machine learning" AND "deep learning"
- Wrong: natural language processing
- Right: "natural language processing"
- Single words must NEVER have quotes (e.g., science, research, learning)
- Use AND to connect required concepts
- Use OR with parentheses for alternatives (e.g., ("soil health" OR biodiversity))
Example conversions showing proper quoting:
"Research on machine learning for natural language processing"
β†’ "machine learning" AND "natural language processing"
"Studies examining anxiety depression stress in workplace"
β†’ (anxiety OR depression OR stress) AND workplace
"Articles about deep learning impact on computer vision"
β†’ "deep learning" AND "computer vision"
"Research on sustainable agriculture practices and their impact on soil health or biodiversity"
β†’ "sustainable agriculture" AND ("soil health" OR biodiversity)
"Articles about effective teaching methods for second language acquisition"
β†’ teaching AND "second language acquisition"
### Input:
{query}
### Response:
"""
# Generate boolean query
inputs = tokenizer(formatted, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result) # "climate change" AND "renewable energy"
```
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{boolean-search-llm,
title={Boolean Search Query LLM},
author={Stephen Zweibel},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/Zwounds/boolean-search-model}
}