Model Card for Model ID

This model is pre-trained to take a representation of a Finite State Transducer (FST) and a string and predict the output of the FST for that string. The FSTs for pre-training were synthetically generated. The goal is to inject an inductive bias for FST-like tasks. Analysis of the model suggests that it has learned to internally simulate transitions between FST states in its hidden representations -- without being explicitly trained to do so.

See SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation for all the details.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Matthias Lindemann
Funded by: UKRI, Huawei, Dutch National Science Foundation
Model type: Sequence-to-Sequence model
Language(s) (NLP): no natural language data was used for continual pretraining
License: [More Information Needed]
Finetuned from model: ByT5

Model Sources

Repository: https://github.com/namednil/sip
Paper: https://aclanthology.org/2024.acl-long.355/

Uses

Direct Use

Without fine-tuning, the model can approximately simulate FST behavior (see also namednil/sip-d4-pt and the documentation in the git repo). The main use is in fine-tuning.

Downstream Use

FST-like tasks such as grapheme-to-phoneme conversion, or simple text editing in few-shot setups.

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

import transformers, torch
tokenizer = transformers.AutoTokenizer.from_pretrained("google/byt5-small")
model = transformers.AutoModelForSeq2SeqLM.from_pretrained("namednil/sip-d4", trust_remote_code=True)
# (always make sure to check the remote code on Huggingface!)

# Construct an optimizer that uses the SIP-finetuning procedure:
optimizer = model.get_optimizer(torch.optim.Adam, prefix_lr=1.0, lr=3e-4)
# ... fine-tune the model as usual

# The above code uses a random initialization of the tunable prefix of SIP. 
# If you don't want that and have more control over the length of the tunable prefix, run:

config = transformers.AutoConfig.from_pretrained("namednil/sip-d4", trust_remote_code=True)
config.random_selection = False
config.prefix_length = 50 
model = transformers.AutoModelForSeq2SeqLM.from_pretrained("namednil/sip-d4", config=config, trust_remote_code=True)

Model Examination

See SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation

Environmental Impact

Hardware Type: Nvidia RTX 2080 Ti
Hours used: 30
Compute Region: Scotland
Carbon Emitted: 0.2 kg CO2eq

Citation

@inproceedings{lindemann-etal-2024-sip,
    title = "{SIP}: Injecting a Structural Inductive Bias into a {S}eq2{S}eq Model by Simulation",
    author = "Lindemann, Matthias  and
      Koller, Alexander  and
      Titov, Ivan",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.355/",
    doi = "10.18653/v1/2024.acl-long.355",
}

namednil
/

sip-d4