You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Books Named Entity Recognition Model (w/ Categories)

This model specialises in recognising book titles, author names, and book categories in short, user-typed queries. It extends the powerful empathyai/gliner_large-v2.5-books model.

Check out this model in action in this experience!

Gutenberg AI Search by Empathy.ai


1. Provenance

This model's provenance originates from fine-tuning empathyai/gliner_large-v2.5-books on the empathyai/books-ner-dataset-categories dataset. This dataset was generated using synthetic query data derived from Project Gutenberg's categories. It inherits the base model's ability to identify titles and authors and extends it by adding the category entity.


2. Use-Case

This model is a drop-in replacement for its predecessor, especially when your text stream revolves around bibliographic requests that also includes genres or categories:

“Looking for Dune from Frank Herbert.” “Any recommendations by Mary Shelley?” “Show me some science fiction books”

Typical applications:

  • Query understanding in library / e-book search engines
  • Post-processing LLM output to structure reading lists and categorize books
  • Digital humanities pipelines that need lightweight title, author, and category extraction

Not suitable for: recognising publishers, ISBNs or long BIB-style references (only short queries were used for training).


3. Performance

This model maintains the high performance of the base model for title and author entities while adding strong performance on the new category entity.

Base Model Performance (title, author)

The performance for the inherited labels, evaluated on the original held-out dataset, is as follows:

Metric Overall title author
Precision 0.9999 0.9999 0.9999
Recall 0.8583 0.7661 0.9287
F1-score 0.9237 0.8675 0.9630
Support 69,880 30,290 39,590

New category Entity Performance

Performance for the newly added category label was evaluated on the empathyai/books-ner-dataset-categories held-out set.

Metric category
Precision 0.9831
Recall 1.0000
F1-score 0.9915
Support 232

4. Quick Start

from gliner import GLiNER

# Make sure to use your new model's repository ID
model = GLiNER.from_pretrained("empathyai/gliner_large-v2.5-books-extended")

text = "Show me some science fiction books by Philip K. Dick."

entities = model.predict_entities(text, ["title", "author", "category"], threshold=0.2)
print(entities)

# Expected output:
# [{'text': 'science fiction', 'label': 'category', 'score': 0.99},
#  {'text': 'Philip K. Dick', 'label': 'author', 'score': 1.00}]

5 Training details

  • Base model: empathyai/gliner_large-v2.5-books
  • Dataset: empathyai/books-ner-dataset-categories — Small dataset with synthetic English queries (categories only)
  • Splits: 2127 train / 273 eval (duplicates removed)
  • Script highlights
    • Learning rate 5 × 10⁻⁶, linear schedule, warm‑up 10 %
    • Batch 32, gradient accumulation 2, focal loss α 0.75 / γ 2
    • 1 epoch
    • Gradient checkpointing + BF16 for memory efficiency
    • Trained on a single L40S; total wall time ≈ 2 min

6 Limitations & bias

  • The vocabulary of categories comes from Project Gutenberg (some extra categories may be unseen).
  • Only short, informal English queries were simulated. Long paragraphs or non‑English text may degrade accuracy.
  • Does not tag publishers, dates, ISBNs, or other bibliographic fields.

7 Acknowledgements

Thanks to the GLiNER authors and maintainers; HuggingFace for hosting; Project Gutenberg volunteers for the free metadata.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for empathyai/gliner_large-v2.5-books-extended

Finetuned
(1)
this model

Datasets used to train empathyai/gliner_large-v2.5-books-extended