---
base_model: unsloth/mistral-7b-instruct-v0.3-bnb-4bit
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- bias-detection
- logical-fallacy
- critical-thinking
- rationality
- unsloth
language:
- en
---

# Model Card for Model ID

This is a Qlora specifically dedicated to the identification of sophism and cognitive bias
His performance for now is 85%-100% in detecting sophism , and 85%-100% for detectiong cognitive bias

It was trained with a custom dataset of 14k lines

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Developed by:** Arthur Vigier
  - **Model type:** Qlora
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model :** mistral-7b-instruct-v0.3-bnb-4bit

## Uses

It is dedicated to be used by anyone that want to judge public discourse based on the fundational basis of there language and the solidity 
of it. Using for education and increasing critical thinking is also a good way to use this tool

### API

PUBLIC API COMING SOON 

### Performance chart

### Sophism
[<img src="https://imgur.com/Vby0Ocq.png" width="500"/>]
### Cognitive Bias
[<img src="https://imgur.com/RbGxSyN.png" width="500"/>]

### Direct Use

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import re

class RationalityDebugger:
    def __init__(self, base_model="mistralai/Mistral-7B-v0.1", lora_model="Artvv/rationality-debugger-v1.0"):
        """
        Initialize the cognitive bias and logical fallacy detector.
        
        Args:
            base_model: Base model from Hugging Face
            lora_model: LoRA adapters for rationality analysis
        """
        print(f"Loading base model: {base_model}")
        self.tokenizer = AutoTokenizer.from_pretrained(base_model)
        
        # Options for optimized loading
        model_kwargs = {
            "torch_dtype": torch.float16,
            "device_map": "auto",
            "low_cpu_mem_usage": True
        }
        
        # Try first with 4-bit quantization to save memory
        try:
            from transformers import BitsAndBytesConfig
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_compute_dtype=torch.float16,
                bnb_4bit_use_double_quant=True
            )
            model_kwargs["quantization_config"] = quantization_config
            self.base_model = AutoModelForCausalLM.from_pretrained(base_model, **model_kwargs)
        except:
            # Fallback if bitsandbytes is not available
            print("4-bit quantization not available, using standard loading...")
            self.base_model = AutoModelForCausalLM.from_pretrained(base_model, **model_kwargs)
            
        print(f"Applying LoRA adapters: {lora_model}")
        self.model = PeftModel.from_pretrained(self.base_model, lora_model)
        self.model.eval()  # Evaluation mode
        
        self.prompt_template = """
Analyze the following argument and identify any logical fallacies or cognitive biases:

{text}

###OUTPUT FORMAT
[Argument] Valid/Invalid
   → If Valid: Type: [ANALYTICAL / INDUCTIVE / ABDUCTIVE]
[Sophisms] Yes/No
   → If Yes: Which: [List detected fallacies]
   → Extract(s): [Provide exact snippet(s)]
[Biases] Yes/No
   → If Yes: Which: [List detected biases]
   → Extract(s): [Provide exact snippet(s)]

[Short explanation]
"""

    def analyze(self, text, max_new_tokens=200, temperature=0.1):
        """
        Analyze text to detect cognitive biases and logical fallacies.
        
        Args:
            text: Text to analyze
            max_new_tokens: Maximum number of new tokens to generate
            temperature: Temperature for generation (lower = more deterministic)
            
        Returns:
            dict: Structured analysis result and raw text
        """
        prompt = self.prompt_template.format(text=text)
        
        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                temperature=temperature,
                top_p=0.9,
                do_sample=temperature > 0
            )
        
        # Extract only the generated part (not the prompt)
        generated_text = self.tokenizer.decode(
            outputs[0][inputs.input_ids.shape[1]:], 
            skip_special_tokens=True
        )
        
        # Parse the response to extract the structure
        result = self._parse_response(generated_text)
        
        return {
            "raw_text": generated_text,
            "structured": result
        }
    
    def _parse_response(self, text):
        """Parse the model's response to extract structured information"""
        result = {
            "argument_valid": None,
            "argument_type": None,
            "has_sophisms": None,
            "detected_sophisms": [],
            "has_biases": None,
            "detected_biases": [],
            "too_short": False,
            "explanation": ""
        }
        
        # Simple parsing example - adapt as needed
        text_lower = text.lower()
        
        # Argument validity detection
        if "valid argument" in text_lower or "[argument] valid" in text_lower:
            result["argument_valid"] = True
        elif "invalid argument" in text_lower or "[argument] invalid" in text_lower:
            result["argument_valid"] = False
            
        # Argument type detection
        for arg_type in ["ANALYTICAL", "INDUCTIVE", "ABDUCTIVE"]:
            if arg_type.lower() in text_lower:
                result["argument_type"] = arg_type
        
        # Fallacy detection
        sophism_keywords = ["ad hominem", "straw man", "red herring", "false dilemma", 
                           "slippery slope", "post hoc", "circular reasoning"]
        
        for sophism in sophism_keywords:
            if sophism in text_lower:
                result["detected_sophisms"].append(sophism)
                
        result["has_sophisms"] = len(result["detected_sophisms"]) > 0
        
        # Cognitive bias detection
        bias_keywords = ["confirmation bias", "availability bias", "anchoring bias", 
                         "hindsight bias", "halo effect", "dunning-kruger"]
        
        for bias in bias_keywords:
            if bias in text_lower:
                result["detected_biases"].append(bias)
        
        result["has_biases"] = len(result["detected_biases"]) > 0
        
        # Explanation
        explanation_match = re.search(r"\[Short explanation\](.*?)(?=$|\[)", text, re.DOTALL)
        if explanation_match:
            result["explanation"] = explanation_match.group(1).strip()
        else:
            # If no explanation tag, take the whole text
            result["explanation"] = text
            
        return result


# --- Usage example ---
if __name__ == "__main__":
    # Create the analyzer
    analyzer = RationalityDebugger(
        base_model="mistralai/Mistral-7B-v0.1",
        lora_model="Artvv/rationality-debugger-v1.0"
    )
    
    # Analysis example
    argument = """
    All birds can fly. Penguins are birds. Therefore, penguins can fly.
    """
    
    result = analyzer.analyze(argument)
    
    # Display raw result
    print("\n=== RAW ANALYSIS ===")
    print(result["raw_text"])
    
    # Display structured result
    print("\n=== STRUCTURED ANALYSIS ===")
    print(f"Valid argument: {result['structured']['argument_valid']}")
    
    if result["structured"]["detected_sophisms"]:
        print("\nDetected fallacies:")
        for sophism in result["structured"]["detected_sophisms"]:
            print(f"- {sophism}")
    
    if result["structured"]["detected_biases"]:
        print("\nDetected cognitive biases:")
        for bias in result["structured"]["detected_biases"]:
            print(f"- {bias}")
    
    print("\nExplanation:")
    print(result["structured"]["explanation"])
```

### Out-of-Scope Use

It is not intended to harass anyone or being rude

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

He is very efficient to the most common sophism and cognitive bias but for some more niche like bias frequency illusion he can be less efficient.
He is mainly dedicated to detect sophism and cognitive bias , he can detect valid reasoning but it is not his main purpose

## Model Card Contact

mail : arvigier@gmail.com


### Framework versions

- PEFT 0.14.0

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)