Books Named Entity Recognition Model (w/ Categories)

This model specialises in recognising book titles, author names, and book categories in short, user-typed queries. It extends the powerful empathyai/gliner_large-v2.5-books model.

Check out this model in action in this experience!

1. Provenance

This model's provenance originates from fine-tuning empathyai/gliner_large-v2.5-books on the empathyai/books-ner-dataset-categories dataset. This dataset was generated using synthetic query data derived from Project Gutenberg's categories. It inherits the base model's ability to identify titles and authors and extends it by adding the category entity.

2. Use-Case

This model is a drop-in replacement for its predecessor, especially when your text stream revolves around bibliographic requests that also includes genres or categories:

“Looking for Dune from Frank Herbert.” “Any recommendations by Mary Shelley?” “Show me some science fiction books”

Typical applications:

Query understanding in library / e-book search engines
Post-processing LLM output to structure reading lists and categorize books
Digital humanities pipelines that need lightweight title, author, and category extraction

Not suitable for: recognising publishers, ISBNs or long BIB-style references (only short queries were used for training).

3. Performance

This model maintains the high performance of the base model for title and author entities while adding strong performance on the new category entity.

Base Model Performance (`title`, `author`)

The performance for the inherited labels, evaluated on the original held-out dataset, is as follows:

Metric	Overall	`title`	`author`
Precision	0.9999	0.9999	0.9999
Recall	0.8583	0.7661	0.9287
F1-score	0.9237	0.8675	0.9630
Support	69,880	30,290	39,590

New `category` Entity Performance

Performance for the newly added category label was evaluated on the empathyai/books-ner-dataset-categories held-out set.

Metric	`category`
Precision	0.9831
Recall	1.0000
F1-score	0.9915
Support	232

4. Quick Start

from gliner import GLiNER

# Make sure to use your new model's repository ID
model = GLiNER.from_pretrained("empathyai/gliner_large-v2.5-books-extended")

text = "Show me some science fiction books by Philip K. Dick."

entities = model.predict_entities(text, ["title", "author", "category"], threshold=0.2)
print(entities)

# Expected output:
# [{'text': 'science fiction', 'label': 'category', 'score': 0.99},
#  {'text': 'Philip K. Dick', 'label': 'author', 'score': 1.00}]

5 Training details

Base model: empathyai/gliner_large-v2.5-books
Dataset: empathyai/books-ner-dataset-categories — Small dataset with synthetic English queries (categories only)
Splits: 2127 train / 273 eval (duplicates removed)
Script highlights
- Learning rate 5 × 10⁻⁶, linear schedule, warm‑up 10 %
- Batch 32, gradient accumulation 2, focal loss α 0.75 / γ 2
- 1 epoch
- Gradient checkpointing + BF16 for memory efficiency
- Trained on a single L40S; total wall time ≈ 2 min

6 Limitations & bias

The vocabulary of categories comes from Project Gutenberg (some extra categories may be unseen).
Only short, informal English queries were simulated. Long paragraphs or non‑English text may degrade accuracy.
Does not tag publishers, dates, ISBNs, or other bibliographic fields.

7 Acknowledgements

Thanks to the GLiNER authors and maintainers; HuggingFace for hosting; Project Gutenberg volunteers for the free metadata.

empathyai
/

gliner_large-v2.5-books-extended

You need to agree to share your contact information to access this model

Books Named Entity Recognition Model (w/ Categories)

1. Provenance

2. Use-Case

3. Performance

Base Model Performance (`title`, `author`)

New `category` Entity Performance

4. Quick Start

5 Training details

6 Limitations & bias

7 Acknowledgements

Model tree for empathyai/gliner_large-v2.5-books-extended

Datasets used to train empathyai/gliner_large-v2.5-books-extended

You need to agree to share your contact information to access this model

Books Named Entity Recognition Model (w/ Categories)

1. Provenance

2. Use-Case

3. Performance

Base Model Performance (title, author)

New category Entity Performance

4. Quick Start

5 Training details

6 Limitations & bias

7 Acknowledgements

Model tree for empathyai/gliner_large-v2.5-books-extended

Datasets used to train empathyai/gliner_large-v2.5-books-extended

Base Model Performance (`title`, `author`)

New `category` Entity Performance