# Boolean Search Query Model This model is fine-tuned to convert natural language queries into boolean search expressions, optimized for academic and research database searching. ## Model Details - **Base Model**: Meta-Llama-3.1-8B - **Training Type**: LoRA fine-tuning - **Task**: Converting natural language to boolean search queries - **Languages**: English - **License**: Same as base model ## Intended Use - Converting natural language search requests into proper boolean expressions - Academic and research database searching - Information retrieval query formulation ## Performance ### Test Results Base Model vs Fine-tuned Model comparison: ``` Natural Query: "Studies examining the relationship between exercise and mental health" Base: exercise AND mental health Fine-tuned: exercise AND "mental health" # Properly handles multi-word terms Natural Query: "Articles about artificial intelligence ethics and regulation or policy" Base: "artificial intelligence ethics" AND ("regulation" OR "policy") # Treats AI ethics as one concept Fine-tuned: "artificial intelligence" AND (ethics OR regulation OR policy) # Properly splits concepts ``` ### Key Improvements 1. Meta-term Removal - Automatically removes terms like "articles", "papers", "research", "studies" - Focuses on actual search concepts 2. Proper Term Quoting - Only quotes multi-word phrases - Single words remain unquoted 3. Logical Grouping - Appropriate use of parentheses for OR groups - Clear operator precedence 4. Minimal Formatting - No unnecessary parentheses - No duplicate terms ## Limitations - English language only - May not handle specialized domain terminology optimally - Limited to boolean operators (AND, OR, NOT) - Designed for academic/research context ## Training Data The model was trained on a curated dataset of natural language queries paired with their correct boolean translations. Dataset characteristics: - Size: 135 examples - Format: Natural query → Boolean expression pairs - Source: Manually curated academic search examples - Validation: Expert-reviewed for accuracy ## Training Process - **Method**: LoRA fine-tuning - **Hardware**: NVIDIA GeForce RTX 4070 Ti SUPER ## How to Use ```python from unsloth import FastLanguageModel # Load model model, tokenizer = FastLanguageModel.from_pretrained( "Zwounds/boolean-search-model", max_seq_length=2048, dtype=None, # Auto-detect load_in_4bit=True ) FastLanguageModel.for_inference(model) # Format query query = "Find papers about climate change and renewable energy" formatted = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Convert this natural language query into a boolean search query by following these rules: 1. FIRST: Remove all meta-terms from this list (they should NEVER appear in output): - articles, papers, research, studies - examining, investigating, analyzing - findings, documents, literature - publications, journals, reviews Example: "Research examining X" → just "X" 2. SECOND: Remove generic implied terms that don't add search value: - Remove words like "practices," "techniques," "methods," "approaches," "strategies" - Remove words like "impacts," "effects," "influences," "role," "applications" - For example: "sustainable agriculture practices" → "sustainable agriculture" - For example: "teaching methodologies" → "teaching" - For example: "leadership styles" → "leadership" 3. THEN: Format the remaining terms: CRITICAL QUOTING RULES: - Multi-word phrases MUST ALWAYS be in quotes - NO EXCEPTIONS - Examples of correct quoting: - Wrong: machine learning AND deep learning - Right: "machine learning" AND "deep learning" - Wrong: natural language processing - Right: "natural language processing" - Single words must NEVER have quotes (e.g., science, research, learning) - Use AND to connect required concepts - Use OR with parentheses for alternatives (e.g., ("soil health" OR biodiversity)) Example conversions showing proper quoting: "Research on machine learning for natural language processing" → "machine learning" AND "natural language processing" "Studies examining anxiety depression stress in workplace" → (anxiety OR depression OR stress) AND workplace "Articles about deep learning impact on computer vision" → "deep learning" AND "computer vision" "Research on sustainable agriculture practices and their impact on soil health or biodiversity" → "sustainable agriculture" AND ("soil health" OR biodiversity) "Articles about effective teaching methods for second language acquisition" → teaching AND "second language acquisition" ### Input: {query} ### Response: """ # Generate boolean query inputs = tokenizer(formatted, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100) result = tokenizer.decode(outputs[0], skip_special_tokens=True) print(result) # "climate change" AND "renewable energy" ``` ## Citation If you use this model in your research, please cite: ```bibtex @misc{boolean-search-llm, title={Boolean Search Query LLM}, author={Stephen Zweibel}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/Zwounds/boolean-search-model} }