Books Named Entity Recognition Model (w/ Categories)
This model specialises in recognising book titles, author names, and book categories in short, user-typed queries. It extends the powerful empathyai/gliner_large-v2.5-books
model.
Check out this model in action in this experience!
1. Provenance
This model's provenance originates from fine-tuning empathyai/gliner_large-v2.5-books
on the empathyai/books-ner-dataset-categories
dataset. This dataset was generated using synthetic query data derived from Project Gutenberg's categories. It inherits the base model's ability to identify titles and authors and extends it by adding the category
entity.
2. Use-Case
This model is a drop-in replacement for its predecessor, especially when your text stream revolves around bibliographic requests that also includes genres or categories:
“Looking for Dune from Frank Herbert.” “Any recommendations by Mary Shelley?” “Show me some science fiction books”
Typical applications:
- Query understanding in library / e-book search engines
- Post-processing LLM output to structure reading lists and categorize books
- Digital humanities pipelines that need lightweight title, author, and category extraction
Not suitable for: recognising publishers, ISBNs or long BIB-style references (only short queries were used for training).
3. Performance
This model maintains the high performance of the base model for title
and author
entities while adding strong performance on the new category
entity.
Base Model Performance (title
, author
)
The performance for the inherited labels, evaluated on the original held-out dataset, is as follows:
Metric | Overall | title |
author |
---|---|---|---|
Precision | 0.9999 | 0.9999 | 0.9999 |
Recall | 0.8583 | 0.7661 | 0.9287 |
F1-score | 0.9237 | 0.8675 | 0.9630 |
Support | 69,880 | 30,290 | 39,590 |
New category
Entity Performance
Performance for the newly added category
label was evaluated on the empathyai/books-ner-dataset-categories
held-out set.
Metric | category |
---|---|
Precision | 0.9831 |
Recall | 1.0000 |
F1-score | 0.9915 |
Support | 232 |
4. Quick Start
from gliner import GLiNER
# Make sure to use your new model's repository ID
model = GLiNER.from_pretrained("empathyai/gliner_large-v2.5-books-extended")
text = "Show me some science fiction books by Philip K. Dick."
entities = model.predict_entities(text, ["title", "author", "category"], threshold=0.2)
print(entities)
# Expected output:
# [{'text': 'science fiction', 'label': 'category', 'score': 0.99},
# {'text': 'Philip K. Dick', 'label': 'author', 'score': 1.00}]
5 Training details
- Base model:
empathyai/gliner_large-v2.5-books
- Dataset:
empathyai/books-ner-dataset-categories
— Small dataset with synthetic English queries (categories only) - Splits: 2127 train / 273 eval (duplicates removed)
- Script highlights
- Learning rate 5 × 10⁻⁶, linear schedule, warm‑up 10 %
- Batch 32, gradient accumulation 2, focal loss α 0.75 / γ 2
- 1 epoch
- Gradient checkpointing + BF16 for memory efficiency
- Trained on a single L40S; total wall time ≈ 2 min
6 Limitations & bias
- The vocabulary of categories comes from Project Gutenberg (some extra categories may be unseen).
- Only short, informal English queries were simulated. Long paragraphs or non‑English text may degrade accuracy.
- Does not tag publishers, dates, ISBNs, or other bibliographic fields.
7 Acknowledgements
Thanks to the GLiNER authors and maintainers; HuggingFace for hosting; Project Gutenberg volunteers for the free metadata.
- Downloads last month
- -
Model tree for empathyai/gliner_large-v2.5-books-extended
Base model
gliner-community/gliner_large-v2.5