metadata

library_name: transformers
datasets:
  - s-nlp/EverGreen-Multilingual
language:
  - ru
  - en
  - fr
  - de
  - he
  - ar
  - zh
base_model:
  - intfloat/multilingual-e5-small
pipeline_tag: text-classification

E5-EG-small

A lightweight multilingual model for temporal classification of questions, fine-tuned from intfloat/multilingual-e5-small.

Model Details

Model Description

E5-EG-small (E5 EverGreen - Small) is an efficient multilingual text classification model that determines whether questions have temporally mutable or immutable answers. This model offers a balanced trade-off between performance and computational efficiency.

Model type: Text Classification
Base model: intfloat/multilingual-e5-small
Language(s): Russian, English, French, German, Hebrew, Arabic, Chinese
License: MIT

Model Sources

Repository: GitHub
Paper: Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA

How to Get Started with the Model

from transformers import pipeline
import torch

# Load model and tokenizer
model_name = "s-nlp/E5-EverGreen-Multilingual-Small"
pipe = pipeline("text-classification", model_name)

# Batch classification example
questions = [
    "What is the capital of France?",
    "Who won the latest World Cup?",
    "What is the speed of light?",
    "What is the current Bitcoin price?"
    "How old is Elon Musk",
    "How old was Leo Tolstoy when he died?"
]

# Classify
results = pipe(questions)

Training Details

Training Data

Same multilingual dataset as E5-EG-large:

~4,000 questions per language
Balanced class distribution
Augmented with synthetic and translated data

Training Procedure

Preprocessing

Identical to E5-EG-large
Maximum sequence length: 64 tokens
Multilingual tokenization

Training Hyperparameters

Training regime: fp16 mixed precision
Epochs: 10
Batch size: 32
Learning rate: 5e-05
Warmup steps: 300
Weight decay: 0.01
Optimizer: AdamW
Loss function: Focal Loss (γ=2.0, α=0.25) with class weighting
Gradient accumulation steps: 1

Hardware

GPUs: Single NVIDIA V100
Training time: ~2 hours

Evaluation

Testing Data

Same test sets as E5-EG-large (2100 samples per language).

Metrics

Per-Language F1 Scores

Language	F1 Score	Δ vs Large
English	0.88	-0.04
Chinese	0.87	-0.04
French	0.86	-0.04
German	0.85	-0.04
Russian	0.84	-0.04
Hebrew	0.83	-0.04
Arabic	0.82	-0.04

Class-wise Performance

Class	Precision	Recall	F1
Immutable	0.83	0.86	0.84
Mutable	0.86	0.83	0.84

Efficiency Metrics

Metric	E5-EG-small	E5-EG-large	Improvement
Parameters	118M	560M	4.7x smaller
Model Size (MB)	471	2,240	4.8x smaller
Inference Time (ms)	12	45	3.8x faster
Memory Usage (GB)	0.8	3.2	4x less
Throughput (samples/sec)	83	22	3.8x higher

Citation

BibTeX:

@misc{pletenev2025truetomorrowmultilingualevergreen,
      title={Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA}, 
      author={Sergey Pletenev and Maria Marina and Nikolay Ivanov and Daria Galimzianova and Nikita Krayko and Mikhail Salnikov and Vasily Konovalov and Alexander Panchenko and Viktor Moskvoretskii},
      year={2025},
      eprint={2505.21115},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.21115}, 
}