Model Card for Model ID
This model is pre-trained to take a representation of a Finite State Transducer (FST) and a string and predict the output of the FST for that string. The FSTs for pre-training were synthetically generated. The goal is to inject an inductive bias for FST-like tasks. Analysis of the model suggests that it has learned to internally simulate transitions between FST states in its hidden representations -- without being explicitly trained to do so.
See SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation for all the details.
Model Details
Model Description
This is the model card of a ๐ค transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: Matthias Lindemann
- Funded by: UKRI, Huawei, Dutch National Science Foundation
- Model type: Sequence-to-Sequence model
- Language(s) (NLP): no natural language data was used for continual pretraining
- License: [More Information Needed]
- Finetuned from model: ByT5
Model Sources
- Repository: https://github.com/namednil/sip
- Paper: https://aclanthology.org/2024.acl-long.355/
Uses
Direct Use
Without fine-tuning, the model can approximately simulate FST behavior (see also namednil/sip-d4-pt
and the documentation in the git repo). The main use is in fine-tuning.
Downstream Use
FST-like tasks such as grapheme-to-phoneme conversion, or simple text editing in few-shot setups.
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
import transformers, torch
tokenizer = transformers.AutoTokenizer.from_pretrained("google/byt5-small")
model = transformers.AutoModelForSeq2SeqLM.from_pretrained("namednil/sip-d4", trust_remote_code=True)
# (always make sure to check the remote code on Huggingface!)
# Construct an optimizer that uses the SIP-finetuning procedure:
optimizer = model.get_optimizer(torch.optim.Adam, prefix_lr=1.0, lr=3e-4)
# ... fine-tune the model as usual
# The above code uses a random initialization of the tunable prefix of SIP.
# If you don't want that and have more control over the length of the tunable prefix, run:
config = transformers.AutoConfig.from_pretrained("namednil/sip-d4", trust_remote_code=True)
config.random_selection = False
config.prefix_length = 50
model = transformers.AutoModelForSeq2SeqLM.from_pretrained("namednil/sip-d4", config=config, trust_remote_code=True)
Model Examination
See SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation
Environmental Impact
- Hardware Type: Nvidia RTX 2080 Ti
- Hours used: 30
- Compute Region: Scotland
- Carbon Emitted: 0.2 kg CO2eq
Citation
@inproceedings{lindemann-etal-2024-sip,
title = "{SIP}: Injecting a Structural Inductive Bias into a {S}eq2{S}eq Model by Simulation",
author = "Lindemann, Matthias and
Koller, Alexander and
Titov, Ivan",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.355/",
doi = "10.18653/v1/2024.acl-long.355",
}
- Downloads last month
- 15
Model tree for namednil/sip-d4
Base model
google/byt5-small