|
--- |
|
tags: |
|
- transformers |
|
- llama |
|
- boolean-search |
|
- search |
|
- language-to-query |
|
library_name: transformers |
|
pipeline_tag: text2text-generation |
|
license: llama2 |
|
title: boolean-search-query-model |
|
emoji: π |
|
sdk: gradio |
|
sdk_version: 4.0.0 |
|
app_file: demo.py |
|
--- |
|
|
|
# Boolean Search Query Model |
|
|
|
Convert natural language queries into proper boolean search expressions for academic databases. This model helps researchers and librarians create properly formatted boolean search queries from natural language descriptions. |
|
|
|
## Features |
|
|
|
- Converts natural language to boolean search expressions |
|
- (MOSTLY!) Handles multi-word terms correctly with quotes |
|
- Removes meta-terms (articles, papers, research, etc.) |
|
- Groups OR clauses appropriately |
|
- Minimal, clean formatting |
|
|
|
## Installation |
|
|
|
```bash |
|
pip install transformers torch unsloth |
|
``` |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
"Zwounds/boolean-search-model", |
|
max_seq_length=2048, |
|
dtype=None, # Auto-detect |
|
load_in_4bit=True |
|
) |
|
FastLanguageModel.for_inference(model) |
|
``` |
|
|
|
## Quick Start |
|
|
|
```python |
|
# Format your query |
|
query = "Find papers about climate change and renewable energy" |
|
prompt = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. |
|
|
|
### Instruction: |
|
Convert this natural language query into a boolean search query by following these rules: |
|
|
|
1. FIRST: Remove all meta-terms from this list (they should NEVER appear in output): |
|
- articles, papers, research, studies |
|
- examining, investigating, analyzing |
|
- findings, documents, literature |
|
- publications, journals, reviews |
|
Example: "Research examining X" β just "X" |
|
|
|
2. SECOND: Remove generic implied terms that don't add search value: |
|
- Remove words like "practices," "techniques," "methods," "approaches," "strategies" |
|
- Remove words like "impacts," "effects," "influences," "role," "applications" |
|
- For example: "sustainable agriculture practices" β "sustainable agriculture" |
|
- For example: "teaching methodologies" β "teaching" |
|
- For example: "leadership styles" β "leadership" |
|
|
|
3. THEN: Format the remaining terms: |
|
CRITICAL QUOTING RULES: |
|
- Multi-word phrases MUST ALWAYS be in quotes - NO EXCEPTIONS |
|
- Examples of correct quoting: |
|
- Wrong: machine learning AND deep learning |
|
- Right: "machine learning" AND "deep learning" |
|
- Wrong: natural language processing |
|
- Right: "natural language processing" |
|
- Single words must NEVER have quotes (e.g., science, research, learning) |
|
- Use AND to connect required concepts |
|
- Use OR with parentheses for alternatives (e.g., ("soil health" OR biodiversity)) |
|
|
|
Example conversions showing proper quoting: |
|
"Research on machine learning for natural language processing" |
|
β "machine learning" AND "natural language processing" |
|
|
|
"Studies examining anxiety depression stress in workplace" |
|
β (anxiety OR depression OR stress) AND workplace |
|
|
|
"Articles about deep learning impact on computer vision" |
|
β "deep learning" AND "computer vision" |
|
|
|
"Research on sustainable agriculture practices and their impact on soil health or biodiversity" |
|
β "sustainable agriculture" AND ("soil health" OR biodiversity) |
|
|
|
"Articles about effective teaching methods for second language acquisition" |
|
β teaching AND "second language acquisition" |
|
|
|
### Input: |
|
{query} |
|
|
|
### Response: |
|
""" |
|
|
|
# Generate boolean query |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
result = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(result) # "climate change" AND "renewable energy" |
|
``` |
|
|
|
## Examples |
|
|
|
Input queries and their boolean translations: |
|
|
|
1. Natural: "Studies about anxiety depression stress in workplace" |
|
- Boolean: (anxiety OR depression OR stress) AND workplace |
|
|
|
2. Natural: "Articles about artificial intelligence ethics and regulation or policy" |
|
- Boolean: "artificial intelligence" AND (ethics OR regulation OR policy) |
|
|
|
3. Natural: "Research on quantum computing applications in cryptography or optimization" |
|
- Boolean: "quantum computing" AND (cryptography OR optimization) |
|
|
|
## Rules |
|
|
|
The model follows these formatting rules: |
|
|
|
1. Meta-terms are removed: |
|
- "articles", "papers", "research", "studies" |
|
- Focus on actual search concepts |
|
|
|
2. Quotes only for multi-word terms: |
|
- "artificial intelligence" AND ethics β |
|
- NOT: "ethics" AND "ai" β |
|
|
|
3. Logical grouping: |
|
- Use parentheses for OR groups |
|
- (x OR y) AND z |
|
|
|
4. Minimal formatting: |
|
- No unnecessary parentheses |
|
- No repeated terms |
|
|
|
## Local Development |
|
|
|
```bash |
|
# Clone repo |
|
git clone https://github.com/your-username/boolean-search-model.git |
|
cd boolean-search-model |
|
|
|
# Install dependencies |
|
pip install -r requirements.txt |
|
|
|
# Run tests |
|
python test_boolean_model.py |
|
``` |
|
|
|
## Contributing |
|
|
|
1. Fork the repository |
|
2. Create your feature branch |
|
3. Add tests for any new functionality |
|
4. Submit a pull request |
|
|
|
## Model Card |
|
|
|
See [MODEL_CARD.md](MODEL_CARD.md) for detailed model information including: |
|
- Training data details |
|
- Performance metrics |
|
- Limitations |
|
- Intended use cases |
|
|
|
## License |
|
|
|
This model is subject to the Llama 2 license. See the [LICENSE](LICENSE) file for details. |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
```bibtex |
|
@misc{boolean-search-llm, |
|
title={Boolean Search Query LLM}, |
|
author={Stephen Zweibel}, |
|
year={2025}, |
|
publisher={Hugging Face}, |
|
url={https://huggingface.co/Zwounds/boolean-search-model} |
|
} |
|
``` |
|
|
|
## Contact |
|
|
|
Stephen Zweibel - [@szweibel](https://github.com/szweibel) |
|
|