namednil
/

sip-d4

@@ -1,13 +1,17 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
@@ -17,21 +21,19 @@ tags: []
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
@@ -41,13 +43,13 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
 ### Out-of-Scope Use
@@ -71,129 +73,54 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
 <!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+base_model:
+- google/byt5-small
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
+This model is pre-trained to take a representation of a Finite State Transducer (FST) and a string and predict the output of the FST for that string. The FSTs for pre-training were synthetically generated.
+The goal is to inject an inductive bias for FST-like tasks. Analysis of the model suggests that it has learned to internally simulate transitions between FST states in its hidden representations -- without being explicitly trained to do so.
+See [SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation](https://aclanthology.org/2024.acl-long.355/) for all the details.
 ## Model Details
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** Matthias Lindemann
+- **Funded by:** UKRI, Huawei, Dutch National Science Foundation
+- **Model type:**  Sequence-to-Sequence model
+- **Language(s) (NLP):** no natural language data was used for continual pretraining
 - **License:** [More Information Needed]
+- **Finetuned from model:** ByT5
+### Model Sources
 <!-- Provide the basic links for the model. -->
+- **Repository:** https://github.com/namednil/sip
+- **Paper:** https://aclanthology.org/2024.acl-long.355/
 ## Uses
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+Without fine-tuning, the model can approximately simulate FST behavior (see also `namednil/sip-d4-pt` and the documentation in the git repo). The main use is in fine-tuning.
+### Downstream Use
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+FST-like tasks such as grapheme-to-phoneme conversion, or simple text editing in few-shot setups.
 ### Out-of-Scope Use
 Use the code below to get started with the model.
+```python
+import transformers, torch
+tokenizer = transformers.AutoTokenizer.from_pretrained("google/byt5-small")
+model = transformers.AutoModelForSeq2SeqLM.from_pretrained("namednil/sip-d4", trust_remote_code=True)
+# (always make sure to check the remote code on Huggingface!)
+# Construct an optimizer that uses the SIP-finetuning procedure:
+optimizer = model.get_optimizer(torch.optim.Adam, prefix_lr=1.0, lr=3e-4)
+# ... fine-tune the model as usual
+# The above code uses a random initialization of the tunable prefix of SIP.
+# If you don't want that and have more control over the length of the tunable prefix, run:
+config = transformers.AutoConfig.from_pretrained("namednil/sip-d4", trust_remote_code=True)
+config.random_selection = False
+config.prefix_length = 50
+model = transformers.AutoModelForSeq2SeqLM.from_pretrained("namednil/sip-d4", config=config, trust_remote_code=True)
+```
+## Model Examination
 <!-- Relevant interpretability work for the model goes here -->
+See [SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation](https://aclanthology.org/2024.acl-long.355/)
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+- **Hardware Type:** Nvidia RTX 2080 Ti
+- **Hours used:** 30
+- **Compute Region:** Scotland
+- **Carbon Emitted:** 0.2 kg CO2eq
+## Citation
+```bibtex
+@inproceedings{lindemann-etal-2024-sip,
+    title = "{SIP}: Injecting a Structural Inductive Bias into a {S}eq2{S}eq Model by Simulation",
+    author = "Lindemann, Matthias  and
+      Koller, Alexander  and
+      Titov, Ivan",
+    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
+    month = aug,
+    year = "2024",
+    address = "Bangkok, Thailand",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2024.acl-long.355/",
+    doi = "10.18653/v1/2024.acl-long.355",
+}
+```