|
--- |
|
library_name: transformers |
|
tags: |
|
- code |
|
license: apache-2.0 |
|
datasets: |
|
- gretelai/synthetic_text_to_sql |
|
language: |
|
- en |
|
base_model: |
|
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
|
--- |
|
|
|
# DeepSeek R1 Distill Qwen 1.5B finetuned for SQL query generation |
|
This model is a fine-tuned version of DeepSeek R1 Distill Qwen 1.5B, specifically optimized for SQL query generation. It has been trained on the GretelAI Synthetic Text-to-SQL dataset to enhance its ability to convert natural language prompts into accurate SQL queries. |
|
|
|
Due to its lightweight architecture, this model can be deployed efficiently on local machines without requiring a GPU, making it ideal for on-premises inference in resource-constrained environments. It offers a balance between performance and efficiency, making it suitable for businesses and developers looking for a cost-effective SQL generation solution. |
|
|
|
## Training Methodology |
|
1. Fine-tuning approach: LoRA (Low-Rank Adaptation) for efficient parameter tuning. |
|
2. Precision: bfloat16 (bf16) to reduce memory consumption while maintaining numerical stability. |
|
3. Gradient Accumulation: Used to handle larger batch sizes within GPU memory limits. |
|
4. Optimizer: AdamW with learning rate scheduling. |
|
5. Cosine Scheduler: Used cosine learning rate scheduler for training stability. (500 warm-up steps, 2000 steps for the cosine schedule.) |
|
6. Hardware: Trained on 8xA100 GPUs with mixed precision training. |
|
|
|
## Use Cases |
|
1. Assisting developers and analysts in writing SQL queries. |
|
2. Automating SQL query generation from user prompts in chatbots. |
|
3. Enhancing SQL-based retrieval-augmented generation (RAG) systems. |
|
|
|
## Limitations & Considerations |
|
1. The model may generate incorrect or suboptimal SQL queries for complex database schemas. |
|
2. It does not perform schema reasoning and requires clear table/column references in the input. |
|
3. Further fine-tuning on domain-specific SQL data may be required for better accuracy. |
|
|
|
## How to Use |
|
You can load the model using 🤗 Transformers: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "NotShrirang/sql-deepseek-r1-distill-qwen-1.5B" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
prompt = "Write a SQL query to get the total revenue from the sales table." |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
- **Developed by:** [NotShrirang](https://huggingface.co/NotShrirang) |
|
- **Language(s) (NLP):** [en] |
|
- **License:** [apache-2.0] |
|
- **Finetuned from model :** [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) |
|
|