Spaces:

ucinlp
/

autoprompt

Build error

App Files Files Community

cbensimon HF Staff commited on May 27, 2021

Commit

861c889

unverified ·

0 Parent(s):

Initial commit

Browse files

Files changed (25) hide show

.circleci/config.yml +18 -0
.gitignore +66 -0
README.md +168 -0
app.py +580 -0
app/.streamlit/config.toml +10 -0
assets/icon.png +0 -0
assets/sst2_train.jsonl +32 -0
autoprompt/__init__.py +0 -0
autoprompt/create_trigger.py +523 -0
autoprompt/finetune.py +203 -0
autoprompt/label_search.py +162 -0
autoprompt/popsicle.py +134 -0
autoprompt/run_linear_probe.py +151 -0
autoprompt/utils.py +376 -0
prompts/fact_retrieval_bert_prompts.jsonl +41 -0
prompts/fact_retrieval_roberta_prompts.jsonl +41 -0
prompts/relation_extraction_bert_prompts.jsonl +39 -0
prompts/relation_extraction_roberta_prompts.jsonl +39 -0
pytest.ini +5 -0
requirements.txt +11 -0
scripts/run_fact_retrieval_example.sh +32 -0
scripts/run_relation_extraction_example.sh +33 -0
setup.py +29 -0
tests/test_create_trigger.py +63 -0
tests/test_utils.py +159 -0

.circleci/config.yml ADDED Viewed

	@@ -0,0 +1,18 @@

+version: 2
+jobs:
+    build:
+        working_directory: ~/autoprompt
+        docker:
+            - image: circleci/python:3.7
+        environment:
+            OMP_NUM_THREADS: 1
+        resource_class: medium
+        parallelism: 1
+        steps:
+            - checkout
+            - run: pip install --upgrade pip
+            - run: pip install -r requirements.txt
+            - run: python -m pytest --disable-warnings
+            - store_test_results:
+                path: test-results

.gitignore ADDED Viewed

	@@ -0,0 +1,66 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*,cover
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# IPython checkpoints
+.ipynb_checkpoints
+# Miscellaneous
+.DS_Store
+.vscode/
+out/
+#data/

README.md ADDED Viewed

	@@ -0,0 +1,168 @@

+# AutoPrompt
+An automated method based on gradient-guided search to create prompts for a diverse set of NLP tasks. AutoPrompt demonstrates that masked language models (MLMs) have an innate ability to perform sentiment analysis, natural language inference, fact retrieval, and relation extraction. Check out our [website](https://ucinlp.github.io/autoprompt/) for the paper and more information.
+## Table of Contents
+* [Setup](#setup)
+* [Generating Prompts](#generating-prompts)
+* [Label Token Selection](#label-token-selection)
+* [Evaluation for Fact Retrieval and Relation Extraction](#evaluation-for-fact-retrieval-and-relation-extraction)
+* [Citation](#citation)
+## Setup
+### 1. Create conda environment
+```
+conda create -n autoprompt -y python=3.7 && conda activate autoprompt
+```
+### 2. Install dependecies
+Install the required packages
+```
+pip install -r requirements.txt
+```
+Also download the spacy model
+```
+python -m spacy download en
+```
+### 3. Download the data
+The datasets for sentiment analysis, NLI, fact retrieval, and relation extraction are available to download [here](https://drive.google.com/drive/folders/1vVhgnSXmbuJb6GLPn_FErY1xDTh1xyv-?usp=sharing)
+There are a couple different datasets for fact retrieval and relation extraction so here are brief overviews of each:
+- Fact Retrieval
+  - `original`: We used the T-REx subset provided by LAMA as our test set and gathered more facts from the [original T-REx dataset](https://hadyelsahar.github.io/t-rex/) that we partitioned into train and dev sets
+  - `original_rob`: We filtered facts in `original` so that each object is a single token for both BERT and RoBERTa
+  - `trex`: We split the extra T-REx data collected (for train/val sets of `original`) into train, dev, test sets
+- Relation Extraction
+  - Trimmed the `original` dataset to compensate for both the [RE baseline](https://github.com/UKPLab/emnlp2017-relation-extraction) and RoBERTa. We also excluded relations `P527` and `P1376` because the RE baseline doesn’t consider them.
+## Generating Prompts
+### Quick Overview of Templates
+A prompt is constructed by mapping things like the original input and trigger tokens to a template that looks something like
+`[CLS] {sub_label} [T] [T] [T] [P]. [SEP]`
+The example above is a template for generating fact retrieval prompts with 3 trigger tokens where `{sub_label}` is a placeholder for the subject in any (subject, relation, object) triplet in fact retrieval. `[P]` denotes the placement of a special `[MASK]` token that will be used to "fill-in-the-blank" by the language model. Each trigger token in the set of trigger tokens that are shared across all prompts is denoted by `[T]`.
+Depending on the language model (i.e. BERT or RoBERTa) you choose to generate prompts, the special tokens will be different. For BERT, stick `[CLS]` and `[SEP]` to each end of the template. For RoBERTa, use `<s>` and `</s>` instead.
+### Sentiment Analysis
+```
+python -m autoprompt.create_trigger \
+    --train glue_data/SST-2/train.tsv \
+    --dev glue_data/SST-2/dev.tsv \
+    --template '<s> {sentence} [T] [T] [T] [P] . </s>' \
+    --label-map '{"0": ["Ġworse", "Ġincompetence", "ĠWorse", "Ġblamed", "Ġsucked"], "1": ["ĠCris", "Ġmarvelous", "Ġphilanthrop", "Ġvisionary", "Ġwonderful"]}' \
+    --num-cand 100 \
+    --accumulation-steps 30 \
+    --bsz 24 \
+    --eval-size 48 \
+    --iters 180 \
+    --model-name roberta-large
+```
+### Natural Language Inference
+```
+python  -m autoprompt.create_trigger  --train SICK_TRAIN_ALL_S.tsv --dev SICK_DEV_ALL_S.tsv --template '<s> {sentence_A} [P] [T] [T] [T] [T] {sentence_B} </s>'  --label-map '{"ENTAILMENT": ["\u0120Taiwan", "\u0120Ara", "abet"], "CONTRADICTION": ["\u0120Only", "\u0120Didn", "\u0120BUT"], "NEUTRAL": ["icy", "oder", "agna"]}' --bsz 120  --model-name roberta-large
+```
+### Fact Retrieval
+```
+python -m autoprompt.create_trigger \
+    --train $path/train.jsonl \
+    --dev $path/dev.jsonl \
+    --template '<s> {sub_label} [T] [T] [T] [P] . </s>' \
+    --num-cand 10 \
+    --accumulation-steps 1 \
+    --model-name roberta-large \
+    --bsz 56 \
+    --eval-size 56 \
+    --iters 1000 \
+    --label-field 'obj_label' \
+    --tokenize-labels \
+    --filter \
+    --print-lama
+```
+### Relation Extraction
+```
+python -m autoprompt.create_trigger \
+    --train $path/train.jsonl \
+    --dev $path/dev.jsonl \
+    --template '[CLS] {context} [SEP] {sub_label} [T] [T] [T] [P] . [SEP]' \
+    --num-cand 10 \
+    --accumulation-steps 1 \
+    --model-name bert-base-cased \
+    --bsz 32 \
+    --eval-size 32 \
+    --iters 500 \
+    --label-field 'obj_label' \
+    --tokenize-labels \
+    --filter \
+    --print-lama \
+    --use-ctx
+```
+## Label Token Selection
+For sentiment analysis
+```
+python -m autoprompt.label_search --train ../data/SST-2/train.tsv --template '[CLS] {sentence} [T] [T] [T] [P]. [SEP]' --label-map '{"0": 0, "1": 1}' --iters 50 --model-name 'bert-base-cased'
+```
+For NLI
+```
+python -m autoprompt.label_search --train ../data/SICK-E-balanced/3-balance/SICK_TRAIN_ALL_S.tsv --template '[CLS] {sentence} [T] [T] [T] [P]. [SEP]' --label-map '{"entailment": 0, "contradiction": 1, "neutral": 2}' --iters 50 --model-name 'bert-base-cased'
+```
+## Evaluation for Fact Retrieval and Relation Extraction
+### 1. Setup LAMA
+Clone [our fork](https://github.com/taylorshin/LAMA) of the LAMA repo and follow the directions to set it up outside of the AutoPrompt repo.
+We recommended creating a separate conda environment for LAMA due to different dependencies and requirements.
+Copy the AutoPrompt data folder into the `data` directory of LAMA or set `data_path_pre` in `scripts/run_experiments.py` to a custom data location.
+In order to get LAMA to work with RoBERTa, run the following commands:
+```
+mkdir pre-trained_language_models/roberta
+cd pre-trained_language_models/roberta
+curl -O https://dl.fbaipublicfiles.com/fairseq/models/roberta.large.tar.gz
+tar -xvzf roberta.large.tar.gz
+```
+### 2. Update prompts
+Update the `data/relations.jsonl` file with your own automatically generated prompts
+### 3. Configure settings
+To change evaluation settings, go to `scripts/run_experiments.py` and update the configurable values accordingly.
+Note: each of the configurable settings are marked with a `[CONFIGURABLE]` comment.
+- Uncomment the settings of the LM you want to evaluate with (and comment out the other LM settings) in the `LMs` list at the top of the file
+- Update the `common_vocab_filename` field to the appropriate file. Anything evaluating both BERT and RoBERTa requires this field to be `common_vocab_cased_rob.txt` instead of the usual `common_vocab_cased.txt`.
+- Set `use_ctx` to `True` if running evaluation for Relation Extraction
+- Set `synthetic` to `True` for perturbed sentence evaluation for Relation Extraction
+- In `get_TREx_parameters` function, set `data_path_pre` to the corresponding data path (e.g. `"../data/relation_extraction"` for Relation Extraction)
+### 4. Evaluate prompts
+Run the evaluation code
+```
+python scripts/run_experiments.py
+```
+### 4. Miscellaneous
+Set `PYTHONPATH` if the following error occurs: `ModuleNotFoundError: No module named 'lama'`
+```
+export PYTHONPATH="${PYTHONPATH}:/path/to/the/AutoPrompt/repo"
+```
+## Citation
+```
+@inproceedings{autoprompt:emnlp20,
+  author = {Taylor Shin and Yasaman Razeghi and Robert L. Logan IV and Eric Wallace and Sameer Singh},
+  title = { {AutoPrompt}: Eliciting Knowledge from Language Models with Automatically Generated Prompts },
+  booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
+  year = {2020}
+}
+```

app.py ADDED Viewed

	@@ -0,0 +1,580 @@

+import csv
+from dataclasses import dataclass
+import io
+import json
+import logging
+import random
+import sys
+from typing import Dict, List
+import pandas as pd
+import streamlit as st
+import torch
+import transformers
+from tqdm import tqdm
+from autoprompt import utils
+import autoprompt.create_trigger as ct
+# logging.getLogger("streamlit.caching").addHandler(logging.StreamHandler(sys.stdout))
+# logging.getLogger("streamlit.caching").setLevel(logging.DEBUG)
+logger = logging.getLogger(__name__)
+with open('assets/sst2_train.jsonl', 'r') as f:
+    DEFAULT_TRAIN = [json.loads(line) for line in f]
+@dataclass
+class CacheTest:
+    """
+    Stores whether the train button has been pressed for a given
+    set of inputs to run_autoprompt.
+    """
+    is_test: bool
+class CacheMiss(Exception):
+    pass
+def css_hack():
+    """
+    Inject some style into this app. ヽ(⌐■_■)ノ
+    """
+    st.markdown(
+        """
+        <style>
+            code {
+                color: #eec66d;
+            }
+            .css-gtmd9c a {
+                color: #6f98af;
+            }
+        </style>
+        """,
+        unsafe_allow_html=True
+    )
+# Setting eq and frozen ensures that a __hash__ method is generated which is needed for caching to
+# properly respond to changed args.
+@dataclass(eq=True, frozen=True)
+class Args:
+    # Configurable
+    template: str
+    model_name: str
+    iters: int
+    num_cand: int
+    accumulation_steps: int
+    # Non-Configurable
+    seed = 0
+    sentence_size = 64
+    tokenize_labels = True
+    filter = False
+    initial_trigger = None
+    label_field = "label"
+    bsz = 32
+    eval_size = 1
+    @classmethod
+    def from_streamlit(cls):
+        st.sidebar.image('assets/icon.png', width=150)
+        st.sidebar.markdown('### Training Parameters')
+        model_name = st.sidebar.selectbox(
+            "Model",
+            options=['roberta-large', 'bert-base-cased'],
+            help="Language model used for training and evaluation."
+        )
+        iters = int(st.sidebar.number_input(
+            "Iterations",
+            value=10,
+            min_value=1,
+            max_value=100,
+            help="Number of trigger search iterations. Larger values may yield better results."
+        ))
+        num_cand = int(st.sidebar.number_input(
+            "Number of Candidates",
+            value=25,
+            min_value=1,
+            max_value=100,
+            help="Number of candidate trigger token replacements to evaluate during each search "
+                 "iteration. Larger values may yield better results."
+        ))
+        accumulation_steps = int(st.sidebar.number_input(
+            "Gradient Accumulation Steps",
+            value=1,
+            min_value=1,
+            max_value=10,
+            help="Number of gradient accumulation steps used during training. Larger values may yield "
+                 "better results. Cannot be larger than half the dataset size."
+        ))
+        st.sidebar.markdown(
+            """
+            ### Template
+            Templates define how task-specific inputs are combined with trigger tokens to create
+            the prompt. They should contain the following placeholders:
+            - `{sentence}`: Placeholders for the task-specific input fields contain the field name
+              between curly brackets. For manually entered data the field name is `{sentence}`. For
+              uploaded csv's, field names should correspond to columns in the csv.
+            - `[T]`: Placeholder for a trigger token. These are learned from the training data.
+            - `[P]`: Placeholder for where to insert the [MASK] token that the model will predict
+              on.
+            Templates can also include manually written text (such as the
+            period in the default example below).
+            """
+        )
+        template = st.sidebar.text_input("Template", "{sentence} [T] [T] [T] [P].")
+        return cls(
+            template=template,
+            model_name=model_name,
+            iters=iters,
+            num_cand=num_cand,
+            accumulation_steps=accumulation_steps,
+        )
+# TODO(rloganiv): This probably could use a better name...
+@dataclass
+class GlobalData:
+    device: torch.device
+    config: transformers.PretrainedConfig
+    model: transformers.PreTrainedModel
+    tokenizer: transformers.PreTrainedTokenizer
+    embeddings: torch.nn.Module
+    embedding_gradient: ct.GradientStorage
+    predictor: ct.PredictWrapper
+    @classmethod
+    @st.cache(allow_output_mutation=True)
+    def from_pretrained(cls, model_name):
+        logger.info(f'Loading pretrained model: {model_name}')
+        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+        if torch.cuda.is_available():
+            st.write('CUDA is available')
+        else:
+            st.write('CUDA not available')
+        config, model, tokenizer = ct.load_pretrained(model_name)
+        model.to(device)
+        embeddings = ct.get_embeddings(model, config)
+        embedding_gradient = ct.GradientStorage(embeddings)
+        predictor = ct.PredictWrapper(model)
+        return cls(
+            device,
+            config,
+            model,
+            tokenizer,
+            embeddings,
+            embedding_gradient,
+            predictor
+        )
+@dataclass
+class Dataset:
+    train: List[int]
+    label_map: Dict[str, str]
+def load_trigger_dataset(dataset, templatizer):
+    instances = []
+    for x in dataset:
+        instances.append(templatizer(x))
+    return instances
+@st.cache(suppress_st_warning=True, allow_output_mutation=True, hash_funcs={CacheTest: lambda o: 0})
+def run_autoprompt(args, dataset, cache_test):
+    if cache_test.is_test:
+        raise CacheMiss()
+    ct.set_seed(args.seed)
+    global_data = GlobalData.from_pretrained(args.model_name)
+    templatizer = utils.TriggerTemplatizer(
+        args.template,
+        global_data.config,
+        global_data.tokenizer,
+        label_field=args.label_field,
+        label_map=dataset.label_map,
+        tokenize_labels=args.tokenize_labels,
+        add_special_tokens=True,
+    )
+    evaluation_fn = ct.AccuracyFn(global_data.tokenizer, dataset.label_map, global_data.device,
+                                  tokenize_labels=args.tokenize_labels)
+    # Do not allow for initial trigger specification.
+    trigger_ids = [global_data.tokenizer.mask_token_id] * templatizer.num_trigger_tokens
+    trigger_ids = torch.tensor(trigger_ids, device=global_data.device).unsqueeze(0)
+    best_trigger_ids = trigger_ids.clone()
+    # Load datasets
+    logger.info('Loading datasets')
+    collator = utils.Collator(pad_token_id=global_data.tokenizer.pad_token_id)
+    try:
+        train_dataset = load_trigger_dataset(dataset.train, templatizer)
+    except KeyError as e:
+        raise RuntimeError(
+            'A field in your template is not present in the uploaded dataset. '
+            f'Check that there is a column with the name: {e}'
+        )
+    train_loader = torch.utils.data.DataLoader(
+        train_dataset, batch_size=args.bsz, shuffle=True, collate_fn=collator)
+    progress = st.progress(0.0)
+    trigger_placeholder = st.empty()
+    best_dev_metric = -float('inf')
+    for i in range(args.iters):
+        logger.info(f'Iteration: {i}')
+        progress.progress(float(i)/args.iters)
+        current_trigger = ','.join(global_data.tokenizer.convert_ids_to_tokens(best_trigger_ids.squeeze(0)))
+        trigger_placeholder.markdown(f'**Current trigger**: {current_trigger}')
+        global_data.model.zero_grad()
+        train_iter = iter(train_loader)
+        averaged_grad = None
+        # Compute gradient of loss
+        for step in range(args.accumulation_steps):
+            try:
+                model_inputs, labels = next(train_iter)
+            except:
+                logger.warning(
+                    'Insufficient data for number of accumulation steps. '
+                    'Effective batch size will be smaller than specified.'
+                )
+                break
+            model_inputs = {k: v.to(global_data.device) for k, v in model_inputs.items()}
+            labels = labels.to(global_data.device)
+            predict_logits = global_data.predictor(model_inputs, trigger_ids)
+            loss = ct.get_loss(predict_logits, labels).mean()
+            loss.backward()
+            grad = global_data.embedding_gradient.get()
+            bsz, _, emb_dim = grad.size()
+            selection_mask = model_inputs['trigger_mask'].unsqueeze(-1)
+            grad = torch.masked_select(grad, selection_mask)
+            grad = grad.view(bsz, templatizer.num_trigger_tokens, emb_dim)
+            if averaged_grad is None:
+                averaged_grad = grad.sum(dim=0) / args.accumulation_steps
+            else:
+                averaged_grad += grad.sum(dim=0) / args.accumulation_steps
+        logger.info('Evaluating Candidates')
+        pbar = tqdm(range(args.accumulation_steps))
+        train_iter = iter(train_loader)
+        token_to_flip = i % templatizer.num_trigger_tokens
+        candidates = ct.hotflip_attack(averaged_grad[token_to_flip],
+                                       global_data.embeddings.weight,
+                                       increase_loss=False,
+                                       num_candidates=args.num_cand)
+        current_score = 0
+        candidate_scores = torch.zeros(args.num_cand, device=global_data.device)
+        denom = 0
+        for step in pbar:
+            try:
+                model_inputs, labels = next(train_iter)
+            except:
+                logger.warning(
+                    'Insufficient data for number of accumulation steps. '
+                    'Effective batch size will be smaller than specified.'
+                )
+                break
+            model_inputs = {k: v.to(global_data.device) for k, v in model_inputs.items()}
+            labels = labels.to(global_data.device)
+            with torch.no_grad():
+                predict_logits = global_data.predictor(model_inputs, trigger_ids)
+                eval_metric = evaluation_fn(predict_logits, labels)
+            # Update current score
+            current_score += eval_metric.sum()
+            denom += labels.size(0)
+            # NOTE: Instead of iterating over tokens to flip we randomly change just one each
+            # time so the gradients don't get stale.
+            for i, candidate in enumerate(candidates):
+                # if candidate.item() in filter_candidates:
+                #     candidate_scores[i] = -1e32
+                #     continue
+                temp_trigger = trigger_ids.clone()
+                temp_trigger[:, token_to_flip] = candidate
+                with torch.no_grad():
+                    predict_logits = global_data.predictor(model_inputs, temp_trigger)
+                    eval_metric = evaluation_fn(predict_logits, labels)
+                candidate_scores[i] += eval_metric.sum()
+        if (candidate_scores >= current_score).any():
+            logger.info('Better trigger detected.')
+            best_candidate_score = candidate_scores.max()
+            best_candidate_idx = candidate_scores.argmax()
+            trigger_ids[:, token_to_flip] = candidates[best_candidate_idx]
+            logger.info(f'Train metric: {best_candidate_score / (denom + 1e-13): 0.4f}')
+        # Skip eval
+        best_trigger_ids = trigger_ids.clone()
+    progress.progress(1.0)
+    current_trigger = ','.join(global_data.tokenizer.convert_ids_to_tokens(best_trigger_ids.squeeze(0)))
+    trigger_placeholder.markdown(f'**Current trigger**: {current_trigger}')
+    best_trigger_tokens = global_data.tokenizer.convert_ids_to_tokens(best_trigger_ids.squeeze(0))
+    train_output = predict_test(map(lambda x: x['sentence'], dataset.train), dataset.label_map,
+                                templatizer, best_trigger_ids, global_data.tokenizer, global_data.predictor, args)
+    # Streamlit does not like accessing widgets across functions, which is
+    # problematic for this "live updating" widget which we want to still
+    # display even if the train output is cached. To get around this, we're
+    # going to delete the widget and replace it with a very similar looking
+    # widget outside the function...no one will ever notice ;)
+    trigger_placeholder.empty()
+    return (
+        best_trigger_tokens,
+        current_score/denom,
+        dataset.label_map,
+        templatizer,
+        best_trigger_ids,
+        global_data.tokenizer,
+        global_data.predictor,
+        args,
+        train_output
+    )
+def predict_test(sentences, label_map, templatizer, best_trigger_ids, tokenizer, predictor, args):
+    # Evaluate clean
+    output = { 'sentences': [] }
+    any_label = None
+    for label in label_map.values():
+        output[label] = []
+        any_label = label
+    output['prompt'] = []
+    for sentence in sentences:
+        model_inputs, _ = templatizer({'sentence': sentence, 'label': any_label})
+        model_inputs = {k: v.to(best_trigger_ids.device) for k, v in model_inputs.items()}
+        prompt_ids = ct.replace_trigger_tokens(
+            model_inputs, best_trigger_ids, model_inputs['trigger_mask'])
+        prompt = ' '.join(tokenizer.convert_ids_to_tokens(prompt_ids['input_ids'][0]))
+        output['prompt'].append(prompt)
+        predict_logits = predictor(model_inputs, best_trigger_ids)
+        output['sentences'].append(sentence)
+        for label in label_map.values():
+            label_id = utils.encode_label(tokenizer=tokenizer, label=label, tokenize=args.tokenize_labels)
+            label_id = label_id.to(best_trigger_ids.device)
+            label_loss = ct.get_loss(predict_logits, label_id)
+            # st.write(sentence, label, label_loss)
+            output[label].append(label_loss.item())
+    return output
+def manual_dataset(use_defaults):
+    num_train_instances = st.slider("Number of Train Instances", 4, 32, 8)
+    any_empty = False
+    dataset = []
+    data_col, label_col = st.beta_columns([3,1])
+    for i in range(num_train_instances):
+        default_data = DEFAULT_TRAIN[i]['sentence'] if use_defaults else ''
+        default_label = DEFAULT_TRAIN[i]['label'] if use_defaults else ''
+        with data_col:
+            data = st.text_input("Train Instance " + str(i+1), default_data)
+        with label_col:
+            label = st.text_input("Train Label " + str(i+1), default_label, max_chars=20)
+        if data == "" or label == "":
+            any_empty = True
+        dataset.append({'sentence': data, 'label': label})
+    label_set = list(set(map(lambda x: x['label'], dataset)))
+    label_idx = {x: i for i, x in enumerate(label_set)}
+    label_map = dict(map(lambda x: (x, x), label_set))
+    if any_empty:
+        st.warning('Waiting for data to be added')
+        st.stop()
+    if len(label_set) < 2:
+        st.warning('Not enough labels')
+        st.stop()
+    return Dataset(
+        train=dataset,
+        label_map=label_map
+    )
+def csv_dataset():
+    st.markdown("""
+        Please upload your training and evaluation csv files.
+        Format restrictions:
+        - The file is required to have a header
+        - The column name of the output field should be `label`.
+        - Each file should contain no more than 64 rows.
+    """)
+    train_csv = st.file_uploader('Train', accept_multiple_files=False)
+    if train_csv is None:
+        st.stop()
+    with io.StringIO(train_csv.getvalue().decode('utf-8')) as f:
+        reader = csv.DictReader(f)
+        train_dataset = list(reader)
+    if len(train_dataset) > 64:
+        raise ValueError('Train dataset is too large. Please limit the number '
+                         'of examples to 64 or less.')
+    labels = set(x['label'] for x in train_dataset)
+    label_map = {x: x for x in labels}
+    return Dataset(
+        train=train_dataset,
+        label_map=label_map
+    )
+def run():
+    css_hack()
+    st.title('AutoPrompt Demo')
+    st.markdown('''
+    For many years, the predominant approach for training machine learning
+    models to solve NLP tasks has been to use supervised training data to
+    estimate model parameters using maximum likelihood estimation or some
+    similar paradigm.  Whether fitting a logistic regression model over a
+    bag-of-words, an LSTM over a sequence of GloVe embeddings, or finetuning a
+    language model such as ELMo or BERT, the approach is essentially the same.
+    However, as language models have become more and more capable of accurately
+    generating plausible text a new possibility for solving classification
+    tasks has emerged...
+    ## Prompting
+    Prompting is the method of converting classification tasks into
+    *fill-in-the-blanks* problems that can be solved by a language model **without
+    modifying the model's internals**. For example, to perform sentiment analysis,
+    we may take the sentence we wish to classify and append the text "Overall, this
+    movie was ____." and feed it into a language model like so:
+    ''')
+    # st.image('assets/bert-mouth.png', use_column_width=True)
+    st.markdown('''
+    By measuring whether the language model assigns a higher probability to
+    words that are associated with a **positive** sentiment ("good", "great",
+    and "fantastic") vs. words that are associated with a **negative**
+    sentiment ("bad", "terrible", or "awful") we can infer the
+    predicted label for the given input. So in this example, because the word "good"
+    has a higher probability than "bad", the predicted label is **positive**.
+    ## AutoPrompt
+    One issue that arises when using prompts is that it is not usually clear
+    how to best pose a task as a fill-in-the-blanks problem in a way that gets
+    the most performance from the language model. Even for a simple problem
+    like sentiment analysis, we don't know whether it is better to ask whether
+    a movie is good/bad, or whether you feel great/terrible about it, and for
+    more abstract problems like natural language inference it is difficult to
+    even know where to start.
+    To cure this writer's block we introduce **AutoPrompt**, a data-driven
+    approach for automatic prompt construction. The basic idea is
+    straightfoward: instead of writing a prompt, a user need only write a
+    **template** that specfies where the *task inputs* go along with placeholders for
+    a number of *trigger tokens* that will automatically be learned by the
+    model and the *predict token* that the model will fill in:
+    ''')
+    # st.image('assets/template.png', use_column_width=True)
+    st.markdown(
+    '''
+    In each iteration of the search process:
+    1. The template is instantiated using a batch of training inputs.
+    2. The loss of the model on each input is measured and used to identify a
+    number of candidate replacements for the current trigger tokens.
+    3. The performance of each candidate is measured on another batch of
+    training data, and the best performing candidate is used in the next
+    iteration.
+    ### Demo
+    To give a better sense of how AutoPrompt works, we have provided a simple
+    interactive demo. You can generate a prompt using the training data we have
+    pre-populated for you, or alternatively write your own training/evaluation
+    instances or upload them using a csv below. In addition, you can vary
+    some of the training parameters, as well as the template using the sidebar
+    on the left.
+    '''
+    )
+    args = Args.from_streamlit()
+    dataset_mode = st.radio('How would you like to input your training data?',
+                            options=['Example Data', 'Manual Input', 'From CSV'])
+    if dataset_mode == 'Example Data':
+        dataset = manual_dataset(use_defaults=True)
+    elif dataset_mode == 'Manual Input':
+        dataset = manual_dataset(use_defaults=False)
+    else:
+        dataset = csv_dataset()
+    button = st.empty()
+    clicked = button.button('Train')
+    if clicked:
+        trigger_tokens, eval_metric, label_map, templatizer, best_trigger_ids, tokenizer, predictor, args, train_output = run_autoprompt(args, dataset, cache_test=CacheTest(False))
+    else:
+        try:
+            trigger_tokens, eval_metric, label_map, templatizer, best_trigger_ids, tokenizer, predictor, args, train_output = run_autoprompt(args, dataset, cache_test=CacheTest(True))
+        except CacheMiss:
+            st.stop()
+        else:
+            button.empty()
+    st.markdown(f'**Final trigger**: {", ".join(trigger_tokens)}')
+    st.dataframe(pd.DataFrame(train_output).style.highlight_min(axis=1, color='#94666b'))
+    logger.debug('Dev metric')
+    st.write('Accuracy: ' + str(round(eval_metric.item()*100, 1)))
+    st.write("""
+    Et voila, you've now effectively finetuned a classifier using just a few
+    kilobytes of parameters (the tokens in the prompt). If you like you can
+    write down your "model" on the back of a napkin and take it with you.
+    ### Try it out yourself!
+    """)
+    sentence = st.text_input("Sentence", 'Enter a test input here')
+    pred_output = predict_test([sentence], label_map ,templatizer, best_trigger_ids, tokenizer, predictor, args)
+    st.dataframe(pd.DataFrame(pred_output).style.highlight_min(axis=1, color='#94666b'))
+    st.markdown('''
+    ## Where can I learn more?
+    If you are interested in learning more about AutoPrompt we recommend
+    [reading our paper](https://arxiv.org/abs/2010.15980) and [checking out our
+    code](https://github.com/ucinlp/autoprompt), or if you'd like you can also
+    watch our presentation at EMNLP 2020:
+    ''')
+    st.components.v1.iframe(
+        src="https://www.youtube.com/embed/IBMT_oOCBbc",
+        height=400,
+    )
+    st.markdown('Thanks!')
+if __name__ == '__main__':
+    logging.basicConfig(level=logging.INFO,
+                        stream=sys.stdout)
+    run()

app/.streamlit/config.toml ADDED Viewed

	@@ -0,0 +1,10 @@

+[server]
+enableCORS = false
+enableXsrfProtection = false
+[theme]
+primaryColor="#96666b"
+backgroundColor="#28282d"
+secondaryBackgroundColor="#333333"
+textColor="#f3f3f3"
+font="monospace"

assets/icon.png ADDED Viewed

assets/sst2_train.jsonl ADDED Viewed

	@@ -0,0 +1,32 @@

+{"label": "terrible", "idx": "61123", "sentence": "in its yearning for the days "}
+{"label": "great", "idx": "23159", "sentence": "of riveting set pieces "}
+{"label": "great", "idx": "21277", "sentence": "all-star reunions "}
+{"label": "terrible", "idx": "27987", "sentence": "( swimfan ) falls victim to sloppy plotting , an insultingly unbelievable final act and a villainess who is too crazy to be interesting . "}
+{"label": "great", "idx": "23240", "sentence": "the leads ) are such a companionable couple "}
+{"label": "great", "idx": "25838", "sentence": "astonishingly skillful and moving "}
+{"label": "terrible", "idx": "35653", "sentence": "has been sacrificed for the sake of spectacle "}
+{"label": "terrible", "idx": "49778", "sentence": "hard to imagine acting that could be any flatter "}
+{"label": "great", "idx": "2403", "sentence": "the better video-game-based flicks , "}
+{"label": "great", "idx": "135", "sentence": "so many of the challenges it poses for itself that one can forgive the film its flaws "}
+{"label": "great", "idx": "42426", "sentence": "while somewhat less than it might have been , the film is a good one "}
+{"label": "great", "idx": "6863", "sentence": "has a dashing and resourceful hero ; a lisping , reptilian villain ; big fights ; big hair ; lavish period scenery ; and a story "}
+{"label": "terrible", "idx": "42330", "sentence": "in the end , the weight of water comes to resemble the kind of soft-core twaddle you 'd expect to see on showtime 's ` red shoe diaries . ' "}
+{"label": "terrible", "idx": "57545", "sentence": "stuck in heaven because he 's afraid of his best-known creation ? "}
+{"label": "terrible", "idx": "23530", "sentence": "can be as tiresome as 9 seconds of jesse helms ' anti- castro "}
+{"label": "great", "idx": "54745", "sentence": "tackles the difficult subject of grief and loss with such life-embracing spirit that the theme does n't drag an audience down "}
+{"label": "terrible", "idx": "30797", "sentence": "violence "}
+{"label": "great", "idx": "30169", "sentence": "feminine energy , a tribute to the power of women to heal "}
+{"label": "terrible", "idx": "42869", "sentence": "flat as a spoof "}
+{"label": "terrible", "idx": "33313", "sentence": "( somebody suggested the stills might make a nice coffee table book ) "}
+{"label": "great", "idx": "10766", "sentence": "brosnan 's finest non-bondish performance "}
+{"label": "terrible", "idx": "15631", "sentence": "all seemed wasted like deniro 's once promising career and the once grand long beach boardwalk . "}
+{"label": "great", "idx": "3401", "sentence": "a lot smarter and more unnerving than the sequels "}
+{"label": "terrible", "idx": "29775", "sentence": "it may not be particularly innovative "}
+{"label": "great", "idx": "39776", "sentence": "it 's packed with adventure and a worthwhile environmental message , so it 's great for the kids . "}
+{"label": "terrible", "idx": "64246", "sentence": "mind ugly "}
+{"label": "terrible", "idx": "57404", "sentence": "meandering , norton has to recite bland police procedural details , fiennes wanders around in an attempt to seem weird and distanced , hopkins looks like a drag queen "}
+{"label": "terrible", "idx": "3766", "sentence": "pathetic idea "}
+{"label": "great", "idx": "63925", "sentence": "has done his homework and "}
+{"label": "great", "idx": "50687", "sentence": "a very compelling , sensitive , intelligent and almost cohesive piece "}
+{"label": "terrible", "idx": "54945", "sentence": "turn and devolves "}
+{"label": "great", "idx": "35874", "sentence": "about this silly , outrageous , ingenious thriller "}

autoprompt/__init__.py ADDED Viewed

File without changes

autoprompt/create_trigger.py ADDED Viewed

	@@ -0,0 +1,523 @@

+import time
+import argparse
+import json
+import logging
+from pathlib import Path
+import random
+import numpy as np
+import torch
+import torch.nn.functional as F
+from torch.utils.data import DataLoader
+import transformers
+from transformers import AutoConfig, AutoModelWithLMHead, AutoTokenizer
+from tqdm import tqdm
+import autoprompt.utils as utils
+logger = logging.getLogger(__name__)
+class GradientStorage:
+    """
+    This object stores the intermediate gradients of the output a the given PyTorch module, which
+    otherwise might not be retained.
+    """
+    def __init__(self, module):
+        self._stored_gradient = None
+        module.register_backward_hook(self.hook)
+    def hook(self, module, grad_in, grad_out):
+        self._stored_gradient = grad_out[0]
+    def get(self):
+        return self._stored_gradient
+class PredictWrapper:
+    """
+    PyTorch transformers model wrapper. Handles necc. preprocessing of inputs for triggers
+    experiments.
+    """
+    def __init__(self, model):
+        self._model = model
+    def __call__(self, model_inputs, trigger_ids):
+        # Copy dict so pop operations don't have unwanted side-effects
+        model_inputs = model_inputs.copy()
+        trigger_mask = model_inputs.pop('trigger_mask')
+        predict_mask = model_inputs.pop('predict_mask')
+        model_inputs = replace_trigger_tokens(model_inputs, trigger_ids, trigger_mask)
+        logits, *_ = self._model(**model_inputs)
+        predict_logits = logits.masked_select(predict_mask.unsqueeze(-1)).view(logits.size(0), -1)
+        return predict_logits
+class AccuracyFn:
+    """
+    Computing the accuracy when a label is mapped to multiple tokens is difficult in the current
+    framework, since the data generator only gives us the token ids. To get around this we
+    compare the target logp to the logp of all labels. If target logp is greater than all (but)
+    one of the label logps we know we are accurate.
+    """
+    def __init__(self, tokenizer, label_map, device, tokenize_labels=False):
+        self._all_label_ids = []
+        self._pred_to_label = []
+        logger.info(label_map)
+        for label, label_tokens in label_map.items():
+            self._all_label_ids.append(utils.encode_label(tokenizer, label_tokens, tokenize_labels).to(device))
+            self._pred_to_label.append(label)
+        logger.info(self._all_label_ids)
+    def __call__(self, predict_logits, gold_label_ids):
+        # Get total log-probability for the true label
+        gold_logp = get_loss(predict_logits, gold_label_ids)
+        # Get total log-probability for all labels
+        bsz = predict_logits.size(0)
+        all_label_logp = []
+        for label_ids in self._all_label_ids:
+            label_logp = get_loss(predict_logits, label_ids.repeat(bsz, 1))
+            all_label_logp.append(label_logp)
+        all_label_logp = torch.stack(all_label_logp, dim=-1)
+        _, predictions = all_label_logp.max(dim=-1)
+        predictions = [self._pred_to_label[x] for x in predictions.tolist()]
+        # Add up the number of entries where loss is greater than or equal to gold_logp.
+        ge_count = all_label_logp.le(gold_logp.unsqueeze(-1)).sum(-1)
+        correct = ge_count.le(1)  # less than in case of num. prec. issues
+        return correct.float()
+    # TODO: @rloganiv - This is hacky. Replace with something sensible.
+    def predict(self, predict_logits):
+        bsz = predict_logits.size(0)
+        all_label_logp = []
+        for label_ids in self._all_label_ids:
+            label_logp = get_loss(predict_logits, label_ids.repeat(bsz, 1))
+            all_label_logp.append(label_logp)
+        all_label_logp = torch.stack(all_label_logp, dim=-1)
+        _, predictions = all_label_logp.max(dim=-1)
+        predictions = [self._pred_to_label[x] for x in predictions.tolist()]
+        return predictions
+def load_pretrained(model_name):
+    """
+    Loads pretrained HuggingFace config/model/tokenizer, as well as performs required
+    initialization steps to facilitate working with triggers.
+    """
+    config = AutoConfig.from_pretrained(model_name)
+    model = AutoModelWithLMHead.from_pretrained(model_name)
+    model.eval()
+    tokenizer = AutoTokenizer.from_pretrained(model_name, add_prefix_space=True)
+    utils.add_task_specific_tokens(tokenizer)
+    return config, model, tokenizer
+def set_seed(seed: int):
+    """Sets the relevant random seeds."""
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.random.manual_seed(seed)
+    torch.cuda.manual_seed(seed)
+def get_embeddings(model, config):
+    """Returns the wordpiece embedding module."""
+    base_model = getattr(model, config.model_type)
+    embeddings = base_model.embeddings.word_embeddings
+    return embeddings
+def hotflip_attack(averaged_grad,
+                   embedding_matrix,
+                   increase_loss=False,
+                   num_candidates=1,
+                   filter=None):
+    """Returns the top candidate replacements."""
+    with torch.no_grad():
+        gradient_dot_embedding_matrix = torch.matmul(
+            embedding_matrix,
+            averaged_grad
+        )
+        if filter is not None:
+            gradient_dot_embedding_matrix -= filter
+        if not increase_loss:
+            gradient_dot_embedding_matrix *= -1
+        _, top_k_ids = gradient_dot_embedding_matrix.topk(num_candidates)
+    return top_k_ids
+def replace_trigger_tokens(model_inputs, trigger_ids, trigger_mask):
+    """Replaces the trigger tokens in input_ids."""
+    out = model_inputs.copy()
+    input_ids = model_inputs['input_ids']
+    trigger_ids = trigger_ids.repeat(trigger_mask.size(0), 1)
+    try:
+        filled = input_ids.masked_scatter(trigger_mask, trigger_ids)
+    except RuntimeError:
+        filled = input_ids
+    out['input_ids'] = filled
+    return out
+def get_loss(predict_logits, label_ids):
+    predict_logp = F.log_softmax(predict_logits, dim=-1)
+    target_logp = predict_logp.gather(-1, label_ids)
+    target_logp = target_logp - 1e32 * label_ids.eq(0)  # Apply mask
+    target_logp = torch.logsumexp(target_logp, dim=-1)
+    return -target_logp
+def isupper(idx, tokenizer):
+    """
+    Determines whether a token (e.g., word piece) begins with a capital letter.
+    """
+    _isupper = False
+    # We only want to check tokens that begin words. Since byte-pair encoding
+    # captures a prefix space, we need to check that the decoded token begins
+    # with a space, and has a capitalized second character.
+    if isinstance(tokenizer, transformers.GPT2Tokenizer):
+        decoded = tokenizer.decode([idx])
+        if decoded[0] == ' ' and decoded[1].isupper():
+            _isupper = True
+    # For all other tokenization schemes, we can just check the first character
+    # is capitalized.
+    elif tokenizer.decode([idx])[0].isupper():
+            _isupper = True
+    return _isupper
+def run_model(args):
+    set_seed(args.seed)
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    logger.info('Loading model, tokenizer, etc.')
+    config, model, tokenizer = load_pretrained(args.model_name)
+    model.to(device)
+    embeddings = get_embeddings(model, config)
+    embedding_gradient = GradientStorage(embeddings)
+    predictor = PredictWrapper(model)
+    if args.label_map is not None:
+        label_map = json.loads(args.label_map)
+        logger.info(f"Label map: {label_map}")
+    else:
+        label_map = None
+        logger.info('No label map')
+    templatizer = utils.TriggerTemplatizer(
+        args.template,
+        config,
+        tokenizer,
+        label_map=label_map,
+        label_field=args.label_field,
+        tokenize_labels=args.tokenize_labels,
+        add_special_tokens=False,
+        use_ctx=args.use_ctx
+    )
+    # Obtain the initial trigger tokens and label mapping
+    if args.initial_trigger:
+        trigger_ids = tokenizer.convert_tokens_to_ids(args.initial_trigger)
+        logger.debug(f'Initial trigger: {args.initial_trigger}')
+        logger.debug(f'Trigger ids: {trigger_ids}')
+        assert len(trigger_ids) == templatizer.num_trigger_tokens
+    else:
+        trigger_ids = [tokenizer.mask_token_id] * templatizer.num_trigger_tokens
+    trigger_ids = torch.tensor(trigger_ids, device=device).unsqueeze(0)
+    best_trigger_ids = trigger_ids.clone()
+    # NOTE: Accuracy can only be computed if a fixed pool of labels is given, which currently
+    # requires the label map to be specified. Since producing a label map may be cumbersome (e.g.,
+    # for link prediction tasks), we just use (negative) loss as the evaluation metric in these cases.
+    if label_map:
+        evaluation_fn = AccuracyFn(tokenizer, label_map, device)
+    else:
+        evaluation_fn = lambda x, y: -get_loss(x, y)
+    logger.info('Loading datasets')
+    collator = utils.Collator(pad_token_id=tokenizer.pad_token_id)
+    if args.perturbed:
+        train_dataset = utils.load_augmented_trigger_dataset(args.train, templatizer, limit=args.limit)
+    else:
+        train_dataset = utils.load_trigger_dataset(args.train, templatizer, use_ctx=args.use_ctx, limit=args.limit)
+    train_loader = DataLoader(train_dataset, batch_size=args.bsz, shuffle=True, collate_fn=collator)
+    if args.perturbed:
+        dev_dataset = utils.load_augmented_trigger_dataset(args.dev, templatizer)
+    else:
+        dev_dataset = utils.load_trigger_dataset(args.dev, templatizer, use_ctx=args.use_ctx)
+    dev_loader = DataLoader(dev_dataset, batch_size=args.eval_size, shuffle=False, collate_fn=collator)
+    # To "filter" unwanted trigger tokens, we subtract a huge number from their logits.
+    filter = torch.zeros(tokenizer.vocab_size, dtype=torch.float32, device=device)
+    if args.filter:
+        logger.info('Filtering label tokens.')
+        if label_map:
+            for label_tokens in label_map.values():
+                label_ids = utils.encode_label(tokenizer, label_tokens).unsqueeze(0)
+                filter[label_ids] = -1e32
+        else:
+            for _, label_ids in train_dataset:
+                filter[label_ids] = -1e32
+        logger.info('Filtering special tokens and capitalized words.')
+        for word, idx in tokenizer.get_vocab().items():
+            if len(word) == 1 or idx >= tokenizer.vocab_size:
+                continue
+            # Filter special tokens.
+            if idx in tokenizer.all_special_ids:
+                logger.debug('Filtered: %s', word)
+                filter[idx] = -1e32
+            # Filter capitalized words (lazy way to remove proper nouns).
+            if isupper(idx, tokenizer):
+                logger.debug('Filtered: %s', word)
+                filter[idx] = -1e32
+    logger.info('Evaluating')
+    numerator = 0
+    denominator = 0
+    for model_inputs, labels in tqdm(dev_loader):
+        model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+        labels = labels.to(device)
+        with torch.no_grad():
+            predict_logits = predictor(model_inputs, trigger_ids)
+        numerator += evaluation_fn(predict_logits, labels).sum().item()
+        denominator += labels.size(0)
+    dev_metric = numerator / (denominator + 1e-13)
+    logger.info(f'Dev metric: {dev_metric}')
+    best_dev_metric = -float('inf')
+    # Measure elapsed time of trigger search
+    start = time.time()
+    for i in range(args.iters):
+        logger.info(f'Iteration: {i}')
+        logger.info('Accumulating Gradient')
+        model.zero_grad()
+        pbar = tqdm(range(args.accumulation_steps))
+        train_iter = iter(train_loader)
+        averaged_grad = None
+        # Accumulate
+        for step in pbar:
+            # Shuttle inputs to GPU
+            try:
+                model_inputs, labels = next(train_iter)
+            except:
+                logger.warning(
+                    'Insufficient data for number of accumulation steps. '
+                    'Effective batch size will be smaller than specified.'
+                )
+                break
+            model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+            labels = labels.to(device)
+            predict_logits = predictor(model_inputs, trigger_ids)
+            loss = get_loss(predict_logits, labels).mean()
+            loss.backward()
+            grad = embedding_gradient.get()
+            bsz, _, emb_dim = grad.size()
+            selection_mask = model_inputs['trigger_mask'].unsqueeze(-1)
+            grad = torch.masked_select(grad, selection_mask)
+            grad = grad.view(bsz, templatizer.num_trigger_tokens, emb_dim)
+            if averaged_grad is None:
+                averaged_grad = grad.sum(dim=0) / args.accumulation_steps
+            else:
+                averaged_grad += grad.sum(dim=0) / args.accumulation_steps
+        logger.info('Evaluating Candidates')
+        pbar = tqdm(range(args.accumulation_steps))
+        train_iter = iter(train_loader)
+        token_to_flip = random.randrange(templatizer.num_trigger_tokens)
+        candidates = hotflip_attack(averaged_grad[token_to_flip],
+                                    embeddings.weight,
+                                    increase_loss=False,
+                                    num_candidates=args.num_cand,
+                                    filter=filter)
+        current_score = 0
+        candidate_scores = torch.zeros(args.num_cand, device=device)
+        denom = 0
+        for step in pbar:
+            try:
+                model_inputs, labels = next(train_iter)
+            except:
+                logger.warning(
+                    'Insufficient data for number of accumulation steps. '
+                    'Effective batch size will be smaller than specified.'
+                )
+                break
+            model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+            labels = labels.to(device)
+            with torch.no_grad():
+                predict_logits = predictor(model_inputs, trigger_ids)
+                eval_metric = evaluation_fn(predict_logits, labels)
+            # Update current score
+            current_score += eval_metric.sum()
+            denom += labels.size(0)
+            # NOTE: Instead of iterating over tokens to flip we randomly change just one each
+            # time so the gradients don't get stale.
+            for i, candidate in enumerate(candidates):
+                # if candidate.item() in filter_candidates:
+                #     candidate_scores[i] = -1e32
+                #     continue
+                temp_trigger = trigger_ids.clone()
+                temp_trigger[:, token_to_flip] = candidate
+                with torch.no_grad():
+                    predict_logits = predictor(model_inputs, temp_trigger)
+                    eval_metric = evaluation_fn(predict_logits, labels)
+                candidate_scores[i] += eval_metric.sum()
+        # TODO: Something cleaner. LAMA templates can't have mask tokens, so if
+        # there are still mask tokens in the trigger then set the current score
+        # to -inf.
+        if args.print_lama:
+            if trigger_ids.eq(tokenizer.mask_token_id).any():
+                current_score = float('-inf')
+        if (candidate_scores > current_score).any():
+            logger.info('Better trigger detected.')
+            best_candidate_score = candidate_scores.max()
+            best_candidate_idx = candidate_scores.argmax()
+            trigger_ids[:, token_to_flip] = candidates[best_candidate_idx]
+            logger.info(f'Train metric: {best_candidate_score / (denom + 1e-13): 0.4f}')
+        else:
+            logger.info('No improvement detected. Skipping evaluation.')
+            continue
+        logger.info('Evaluating')
+        numerator = 0
+        denominator = 0
+        for model_inputs, labels in tqdm(dev_loader):
+            model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+            labels = labels.to(device)
+            with torch.no_grad():
+                predict_logits = predictor(model_inputs, trigger_ids)
+            numerator += evaluation_fn(predict_logits, labels).sum().item()
+            denominator += labels.size(0)
+        dev_metric = numerator / (denominator + 1e-13)
+        logger.info(f'Trigger tokens: {tokenizer.convert_ids_to_tokens(trigger_ids.squeeze(0))}')
+        logger.info(f'Dev metric: {dev_metric}')
+        # TODO: Something cleaner. LAMA templates can't have mask tokens, so if
+        # there are still mask tokens in the trigger then set the current score
+        # to -inf.
+        if args.print_lama:
+            if best_trigger_ids.eq(tokenizer.mask_token_id).any():
+                best_dev_metric = float('-inf')
+        if dev_metric > best_dev_metric:
+            logger.info('Best performance so far')
+            best_trigger_ids = trigger_ids.clone()
+            best_dev_metric = dev_metric
+    best_trigger_tokens = tokenizer.convert_ids_to_tokens(best_trigger_ids.squeeze(0))
+    logger.info(f'Best tokens: {best_trigger_tokens}')
+    logger.info(f'Best dev metric: {best_dev_metric}')
+    if args.print_lama:
+        # Templatize with [X] and [Y]
+        if args.use_ctx:
+            model_inputs, label_ids = templatizer({
+                'sub_label': '[X]',
+                'obj_label': tokenizer.lama_y,
+                'context': ''
+            })
+        else:
+            model_inputs, label_ids = templatizer({
+                'sub_label': '[X]',
+                'obj_label': tokenizer.lama_y,
+            })
+        lama_template = model_inputs['input_ids']
+        # Instantiate trigger tokens
+        lama_template.masked_scatter_(
+            mask=model_inputs['trigger_mask'],
+            source=best_trigger_ids.cpu())
+        # Instantiate label token
+        lama_template.masked_scatter_(
+            mask=model_inputs['predict_mask'],
+            source=label_ids)
+        # Print LAMA JSON template
+        relation = args.train.parent.stem
+        # The following block of code is a bit hacky but whatever, it gets the job done
+        if args.use_ctx:
+            template = tokenizer.decode(lama_template.squeeze(0)[1:-1]).replace('[SEP] ', '').replace('</s> ', '').replace('[ X ]', '[X]')
+        else:
+            template = tokenizer.decode(lama_template.squeeze(0)[1:-1]).replace('[ X ]', '[X]')
+        out = {
+            'relation': args.train.parent.stem,
+            'template': template
+        }
+        print(json.dumps(out))
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--train', type=Path, required=True, help='Train data path')
+    parser.add_argument('--dev', type=Path, required=True, help='Dev data path')
+    parser.add_argument('--template', type=str, help='Template string')
+    parser.add_argument('--label-map', type=str, default=None, help='JSON object defining label map')
+    # LAMA-specific
+    parser.add_argument('--tokenize-labels', action='store_true',
+                        help='If specified labels are split into word pieces.'
+                             'Needed for LAMA probe experiments.')
+    parser.add_argument('--filter', action='store_true',
+                        help='If specified, filter out special tokens and gold objects.'
+                             'Furthermore, tokens starting with capital '
+                             'letters will not appear in triggers. Lazy '
+                             'approach for removing proper nouns.')
+    parser.add_argument('--print-lama', action='store_true',
+                        help='Prints best trigger in LAMA format.')
+    parser.add_argument('--initial-trigger', nargs='+', type=str, default=None, help='Manual prompt')
+    parser.add_argument('--label-field', type=str, default='label',
+                        help='Name of the label field')
+    parser.add_argument('--bsz', type=int, default=32, help='Batch size')
+    parser.add_argument('--eval-size', type=int, default=256, help='Eval size')
+    parser.add_argument('--iters', type=int, default=100,
+                        help='Number of iterations to run trigger search algorithm')
+    parser.add_argument('--accumulation-steps', type=int, default=10)
+    parser.add_argument('--model-name', type=str, default='bert-base-cased',
+                        help='Model name passed to HuggingFace AutoX classes.')
+    parser.add_argument('--seed', type=int, default=0)
+    parser.add_argument('--limit', type=int, default=None)
+    parser.add_argument('--use-ctx', action='store_true',
+                        help='Use context sentences for relation extraction only')
+    parser.add_argument('--perturbed', action='store_true',
+                        help='Perturbed sentence evaluation of relation extraction: replace each object in dataset with a random other object')
+    parser.add_argument('--patience', type=int, default=5)
+    parser.add_argument('--num-cand', type=int, default=10)
+    parser.add_argument('--sentence-size', type=int, default=50)
+    parser.add_argument('--debug', action='store_true')
+    args = parser.parse_args()
+    if args.debug:
+        level = logging.DEBUG
+    else:
+        level = logging.INFO
+    logging.basicConfig(level=level)
+    run_model(args)

autoprompt/finetune.py ADDED Viewed

	@@ -0,0 +1,203 @@

+"""
+Script for running finetuning on glue tasks.
+Largely copied from:
+    https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py
+"""
+import argparse
+import logging
+from pathlib import Path
+import random
+import numpy as np
+import torch
+import torch.nn.functional as F
+from torch.utils.data import DataLoader
+from torch.optim.lr_scheduler import LambdaLR
+import transformers
+from transformers import (
+    AdamW, AutoConfig, AutoModelForSequenceClassification, AutoTokenizer
+)
+from tqdm import tqdm
+import autoprompt.utils as utils
+logger = logging.getLogger(__name__)
+def set_seed(seed: int):
+    """Sets the relevant random seeds."""
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.random.manual_seed(seed)
+    torch.cuda.manual_seed(seed)
+def get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, last_epoch=-1):
+    """ Create a schedule with a learning rate that decreases linearly after
+    linearly increasing during a warmup period.
+    From:
+        https://github.com/uds-lsv/bert-stable-fine-tuning/blob/master/src/transformers/optimization.py
+    """
+    def lr_lambda(current_step):
+        if current_step < num_warmup_steps:
+            return float(current_step) / float(max(1, num_warmup_steps))
+        return max(
+            0.0, float(num_training_steps - current_step) / float(max(1, num_training_steps - num_warmup_steps))
+        )
+    return LambdaLR(optimizer, lr_lambda, last_epoch)
+def main(args):
+    set_seed(args.seed)
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    config = AutoConfig.from_pretrained(args.model_name, num_labels=args.num_labels)
+    tokenizer = AutoTokenizer.from_pretrained(args.model_name)
+    model = AutoModelForSequenceClassification.from_pretrained(args.model_name, config=config)
+    model.to(device)
+    collator = utils.Collator(pad_token_id=tokenizer.pad_token_id)
+    train_dataset, label_map = utils.load_classification_dataset(
+        args.train,
+        tokenizer,
+        args.field_a,
+        args.field_b,
+        args.label_field,
+        limit=args.limit
+    )
+    train_loader = DataLoader(train_dataset, batch_size=args.bsz, shuffle=True, collate_fn=collator)
+    dev_dataset, _ = utils.load_classification_dataset(
+        args.dev,
+        tokenizer,
+        args.field_a,
+        args.field_b,
+        args.label_field,
+        label_map
+    )
+    dev_loader = DataLoader(dev_dataset, batch_size=args.bsz, shuffle=False, collate_fn=collator)
+    test_dataset, _ = utils.load_classification_dataset(
+        args.test,
+        tokenizer,
+        args.field_a,
+        args.field_b,
+        args.label_field,
+        label_map
+    )
+    test_loader = DataLoader(test_dataset, batch_size=args.bsz, shuffle=False, collate_fn=collator)
+    if args.bias_correction:
+        betas = (0.9, 0.999)
+    else:
+        betas = (0.0, 0.000)
+    optimizer = AdamW(
+        model.parameters(),
+        lr=args.lr,
+        weight_decay=1e-2,
+        betas=betas
+    )
+    # Use suggested learning rate scheduler
+    num_training_steps = len(train_dataset) * args.epochs // args.bsz
+    num_warmup_steps = num_training_steps // 10
+    scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps,
+                                                num_training_steps)
+    if not args.ckpt_dir.exists():
+        logger.info(f'Making checkpoint directory: {args.ckpt_dir}')
+        args.ckpt_dir.mkdir(parents=True)
+    elif not args.force_overwrite:
+        raise RuntimeError('Checkpoint directory already exists.')
+    try:
+        best_accuracy = 0
+        for epoch in range(args.epochs):
+            logger.info('Training...')
+            model.train()
+            avg_loss = utils.ExponentialMovingAverage()
+            pbar = tqdm(train_loader)
+            for model_inputs, labels in pbar:
+                model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+                labels = labels.to(device)
+                optimizer.zero_grad()
+                logits, *_ = model(**model_inputs)
+                loss = F.cross_entropy(logits, labels.squeeze(-1))
+                loss.backward()
+                optimizer.step()
+                scheduler.step()
+                avg_loss.update(loss.item())
+                pbar.set_description(f'loss: {avg_loss.get_metric(): 0.4f}, '
+                                     f'lr: {optimizer.param_groups[0]["lr"]: .3e}')
+            logger.info('Evaluating...')
+            model.eval()
+            correct = 0
+            total = 0
+            with torch.no_grad():
+                for model_inputs, labels in dev_loader:
+                    model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+                    labels = labels.to(device)
+                    logits, *_ = model(**model_inputs)
+                    _, preds = logits.max(dim=-1)
+                    correct += (preds == labels.squeeze(-1)).sum().item()
+                    total += labels.size(0)
+                accuracy = correct / (total + 1e-13)
+            logger.info(f'Accuracy: {accuracy : 0.4f}')
+            if accuracy > best_accuracy:
+                logger.info('Best performance so far.')
+                model.save_pretrained(args.ckpt_dir)
+                tokenizer.save_pretrained(args.ckpt_dir)
+                best_accuracy = accuracy
+    except KeyboardInterrupt:
+        logger.info('Interrupted...')
+    logger.info('Testing...')
+    model.eval()
+    correct = 0
+    total = 0
+    with torch.no_grad():
+        for model_inputs, labels in test_loader:
+            model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+            labels = labels.to(device)
+            logits, *_ = model(**model_inputs)
+            _, preds = logits.max(dim=-1)
+            correct += (preds == labels.squeeze(-1)).sum().item()
+            total += labels.size(0)
+        accuracy = correct / (total + 1e-13)
+    logger.info(f'Accuracy: {accuracy : 0.4f}')
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--model-name', type=str)
+    parser.add_argument('--train', type=Path)
+    parser.add_argument('--dev', type=Path)
+    parser.add_argument('--test', type=Path)
+    parser.add_argument('--field-a', type=str)
+    parser.add_argument('--field-b', type=str, default=None)
+    parser.add_argument('--label-field', type=str, default='label')
+    parser.add_argument('--ckpt-dir', type=Path, default=Path('ckpt/'))
+    parser.add_argument('--num-labels', type=int, default=2)
+    parser.add_argument('--bsz', type=int, default=32)
+    parser.add_argument('--epochs', type=int, default=3)
+    parser.add_argument('--lr', type=float, default=2e-5)
+    parser.add_argument('--limit', type=int, default=None)
+    parser.add_argument('--seed', type=int, default=1234)
+    parser.add_argument('--bias-correction', action='store_true')
+    parser.add_argument('-f', '--force-overwrite', action='store_true')
+    parser.add_argument('--debug', action='store_true')
+    args = parser.parse_args()
+    if args.debug:
+        level = logging.DEBUG
+    else:
+        level = logging.INFO
+    logging.basicConfig(level=level)
+    main(args)

autoprompt/label_search.py ADDED Viewed

	@@ -0,0 +1,162 @@

+"""
+This is a hacky little attempt using the tools from the trigger creation script to identify a
+good set of label strings. The idea is to train a linear classifier over the predict token and
+then look at the most similar tokens.
+"""
+import argparse
+import json
+import logging
+from pathlib import Path
+import torch
+import torch.nn.functional as F
+from torch.utils.data import DataLoader
+from transformers import (
+    AutoConfig, AutoModelWithLMHead, AutoTokenizer, BertForMaskedLM, RobertaForMaskedLM
+)
+from tqdm import tqdm
+import autoprompt.utils as utils
+import autoprompt.create_trigger as ct
+logger = logging.getLogger(__name__)
+def load_pretrained(model_name):
+    """
+    Loads pretrained HuggingFace config/model/tokenizer, as well as performs required
+    initialization steps to facilitate working with triggers.
+    """
+    config = AutoConfig.from_pretrained(args.model_name)
+    model = AutoModelWithLMHead.from_pretrained(args.model_name, config=config)
+    model.eval()
+    tokenizer = AutoTokenizer.from_pretrained(args.model_name)
+    utils.add_task_specific_tokens(tokenizer)
+    return config, model, tokenizer
+def get_final_embeddings(model):
+    if isinstance(model, BertForMaskedLM):
+        return model.cls.predictions.transform
+    elif isinstance(model, RobertaForMaskedLM):
+        return model.lm_head.layer_norm
+    else:
+        raise NotImplementedError(f'{model} not currently supported')
+def get_word_embeddings(model):
+    if isinstance(model, BertForMaskedLM):
+        return model.cls.predictions.decoder.weight
+    elif isinstance(model, RobertaForMaskedLM):
+        return model.lm_head.decoder.weight
+    else:
+        raise NotImplementedError(f'{model} not currently supported')
+def main(args):
+    ct.set_seed(args.seed)
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    logger.info('Loading model, tokenizer, etc.')
+    config, model, tokenizer = load_pretrained(args.model_name)
+    model.to(device)
+    final_embeddings = get_final_embeddings(model)
+    embedding_storage = utils.OutputStorage(final_embeddings)
+    word_embeddings = get_word_embeddings(model)
+    label_map = json.loads(args.label_map)
+    reverse_label_map = {y: x for x, y in label_map.items()}
+    templatizer = utils.TriggerTemplatizer(
+        args.template,
+        tokenizer,
+        label_map=label_map,
+        label_field=args.label_field,
+        add_special_tokens=False
+    )
+    # The weights of this projection will help identify the best label words.
+    projection = torch.nn.Linear(config.hidden_size, len(label_map))
+    projection.to(device)
+    # Obtain the initial trigger tokens and label mapping
+    if args.initial_trigger:
+        trigger_ids = tokenizer.encode(
+            args.initial_trigger,
+            add_special_tokens=False,
+            add_prefix_space=True
+        )
+        assert len(trigger_ids) == templatizer.num_trigger_tokens
+    else:
+        trigger_ids = [tokenizer.mask_token_id] * templatizer.num_trigger_tokens
+    trigger_ids = torch.tensor(trigger_ids, device=device).unsqueeze(0)
+    logger.info('Loading datasets')
+    collator = utils.Collator(pad_token_id=tokenizer.pad_token_id)
+    train_dataset = utils.load_trigger_dataset(args.train, templatizer)
+    train_loader = DataLoader(train_dataset, batch_size=args.bsz, shuffle=True, collate_fn=collator)
+    optimizer = torch.optim.Adam(projection.parameters(), lr=args.lr)
+    scores = torch.matmul(projection.weight, word_embeddings.transpose(0, 1))
+    scores = F.softmax(scores, dim=0)
+    for i, row in enumerate(scores):
+        _, top = row.topk(args.k)
+        decoded = tokenizer.convert_ids_to_tokens(top)
+        logger.info(f"Top k for class {reverse_label_map[i]}: {', '.join(decoded)}")
+    logger.info('Training')
+    for i in range(args.iters):
+        pbar = tqdm(train_loader)
+        for model_inputs, labels in pbar:
+            optimizer.zero_grad()
+            model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+            labels = labels.to(device)
+            trigger_mask = model_inputs.pop('trigger_mask')
+            predict_mask = model_inputs.pop('predict_mask')
+            model_inputs = ct.replace_trigger_tokens(model_inputs, trigger_ids, trigger_mask)
+            with torch.no_grad():
+                model(**model_inputs)
+            embeddings = embedding_storage.get()
+            predict_embeddings = embeddings.masked_select(predict_mask.unsqueeze(-1)).view(embeddings.size(0), -1)
+            logits = projection(predict_embeddings)
+            loss = F.cross_entropy(logits, labels.squeeze(-1))
+            loss.backward()
+            optimizer.step()
+            pbar.set_description(f'loss: {loss : 0.4f}')
+        scores = torch.matmul(projection.weight, word_embeddings.transpose(0, 1))
+        scores = F.softmax(scores, dim=0)
+        for i, row in enumerate(scores):
+            _, top = row.topk(args.k)
+            decoded = tokenizer.convert_ids_to_tokens(top)
+            logger.info(f"Top k for class {reverse_label_map[i]}: {', '.join(decoded)}")
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--train', type=Path, required=True, help='Train data path')
+    parser.add_argument('--template', type=str, help='Template string')
+    parser.add_argument('--label-map', type=str, help='JSON object defining label map')
+    parser.add_argument('--initial-trigger', type=str, default=None, help='Manual prompt')
+    parser.add_argument('--label-field', type=str, default='label',
+                        help='Name of the label field')
+    parser.add_argument('--lr', type=float, default=3e-4, help='Learning rate')
+    parser.add_argument('--k', type=int, default=50, help='Number of label tokens to print')
+    parser.add_argument('--bsz', type=int, default=32, help='Batch size')
+    parser.add_argument('--iters', type=int, default=10,
+                        help='Number of iterations to run label search')
+    parser.add_argument('--model-name', type=str, default='bert-base-cased',
+                        help='Model name passed to HuggingFace AutoX classes.')
+    parser.add_argument('--seed', type=int, default=0)
+    parser.add_argument('--debug', action='store_true')
+    args = parser.parse_args()
+    if args.debug:
+        level = logging.DEBUG
+    else:
+        level = logging.INFO
+    logging.basicConfig(level=level)
+    main(args)

autoprompt/popsicle.py ADDED Viewed

	@@ -0,0 +1,134 @@

+"""
+Frozen model with a linear topping...I'm really sleepy...
+"""
+import logging
+import torch
+from torch.nn import CrossEntropyLoss, MSELoss
+from transformers import (
+    AutoConfig,
+    BertConfig,
+    BertForSequenceClassification,
+    PretrainedConfig,
+    RobertaConfig,
+    RobertaForSequenceClassification
+)
+logger = logging.getLogger(__name__)
+class Bertsicle(BertForSequenceClassification):
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        labels=None,
+    ):
+        with torch.no_grad():
+            outputs = self.bert(
+                input_ids,
+                attention_mask=attention_mask,
+                token_type_ids=token_type_ids,
+                position_ids=position_ids,
+                head_mask=head_mask,
+                inputs_embeds=inputs_embeds,
+            )
+        pooled_output = outputs[1]  #by ROB
+        pooled_output = outputs[0]
+        pooled_output = pooled_output[:,1:,:] #eliminating CLS token
+        pooled_output = torch.mean(pooled_output, dim=1)
+        pooled_output = self.dropout(pooled_output)
+        logits = self.classifier(pooled_output)
+        outputs = (logits,) + outputs[2:]  # add hidden states and attention if they are here
+        if labels is not None:
+            if self.num_labels == 1:
+                #  We are doing regression
+                loss_fct = MSELoss()
+                loss = loss_fct(logits.view(-1), labels.view(-1))
+            else:
+                loss_fct = CrossEntropyLoss()
+                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+            outputs = (loss,) + outputs
+        return outputs  # (loss), logits, (hidden_states), (attentions)
+class Robertasicle(RobertaForSequenceClassification):
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        labels=None,
+    ):
+        with torch.no_grad():
+            outputs = self.roberta(
+                input_ids,
+                attention_mask=attention_mask,
+                token_type_ids=token_type_ids,
+                position_ids=position_ids,
+                head_mask=head_mask,
+                inputs_embeds=inputs_embeds,
+            )
+        sequence_output = outputs[0]
+        sequence_output = sequence_output[:, 1:, :]  # eliminating <s> token
+        pooled_sequence_output = torch.mean(sequence_output, dim=1, keepdim=True)
+        logits = self.classifier(pooled_sequence_output)
+        outputs = (logits,) + outputs[2:]
+        if labels is not None:
+            if self.num_labels == 1:
+                #  We are doing regression
+                loss_fct = MSELoss()
+                loss = loss_fct(logits.view(-1), labels.view(-1))
+            else:
+                loss_fct = CrossEntropyLoss()
+                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+            outputs = (loss,) + outputs
+        return outputs  # (loss), logits, (hidden_states), (attentions)
+MODEL_MAPPING = {
+        RobertaConfig: Robertasicle,
+        BertConfig: Bertsicle
+}
+class AutoPopsicle:
+    def __init__(self):
+        raise EnvironmentError('You done goofed. Use `.from_pretrained()` or something.')
+    @classmethod
+    def from_config(cls, config):
+        for config_class, model_class in MODEL_MAPPING.items():
+            if isinstance(config, config_class):
+                return model_class(config)
+        raise ValueError('We do not support this config.')
+    @classmethod
+    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
+        config = kwargs.pop("config", None)
+        if not isinstance(config, PretrainedConfig):
+            config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
+        for config_class, model_class in MODEL_MAPPING.items():
+            if isinstance(config, config_class):
+                logger.info(f'Config class: {config_class}')
+                logger.info(f'Model class: {model_class}')
+                return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
+        raise ValueError('We do not support "{pretrained_model_name_or_path}".')

autoprompt/run_linear_probe.py ADDED Viewed

	@@ -0,0 +1,151 @@

+"""
+Script for running a linear probe on glue tasks.
+Largely copied from:
+    https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py
+"""
+import argparse
+import logging
+from pathlib import Path
+import torch
+import torch.nn.functional as F
+from torch.utils.data import DataLoader
+from transformers import AutoConfig, AutoTokenizer, WEIGHTS_NAME, CONFIG_NAME
+from tqdm import tqdm
+from autoprompt.popsicle import AutoPopsicle
+import autoprompt.utils as utils
+logger = logging.getLogger(__name__)
+def main(args):
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    config = AutoConfig.from_pretrained(args.model_name, num_labels=args.num_labels)
+    tokenizer = AutoTokenizer.from_pretrained(args.model_name)
+    model = AutoPopsicle.from_pretrained(args.model_name, config=config)
+    model.to(device)
+    collator = utils.Collator(pad_token_id=tokenizer.pad_token_id)
+    train_dataset, label_map = utils.load_classification_dataset(
+        args.train,
+        tokenizer,
+        args.field_a,
+        args.field_b,
+        args.label_field
+    )
+    train_loader = DataLoader(train_dataset, batch_size=args.bsz, shuffle=True, collate_fn=collator)
+    dev_dataset, _ = utils.load_classification_dataset(
+        args.dev,
+        tokenizer,
+        args.field_a,
+        args.field_b,
+        args.label_field,
+        label_map
+    )
+    dev_loader = DataLoader(dev_dataset, batch_size=args.bsz, shuffle=True, collate_fn=collator)
+    test_dataset, _ = utils.load_classification_dataset(
+        args.test,
+        tokenizer,
+        args.field_a,
+        args.field_b,
+        args.label_field,
+        label_map
+    )
+    test_loader = DataLoader(test_dataset, batch_size=args.bsz, shuffle=True, collate_fn=collator)
+    optimizer = torch.optim.Adam(model.classifier.parameters(), lr=args.lr, weight_decay=1e-6)
+    if not args.ckpt_dir.exists():
+        # logger.info(f'Making checkpoint directory: {args.ckpt_dir}')
+        args.ckpt_dir.mkdir(parents=True)
+    elif not args.force_overwrite:
+        raise RuntimeError('Checkpoint directory already exists.')
+    best_accuracy = 0
+    try:
+        for epoch in range(args.epochs):
+            logger.info('Training...')
+            model.eval()  # Just linear regression - don't want model outputs changing during training.
+            avg_loss = utils.ExponentialMovingAverage()
+            pbar = tqdm(train_loader)
+            for model_inputs, labels in pbar:
+                model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+                labels = labels.to(device)
+                optimizer.zero_grad()
+                logits, *_ = model(**model_inputs)
+                loss = F.cross_entropy(logits, labels.squeeze(-1))
+                loss.backward()
+                optimizer.step()
+                avg_loss.update(loss.item())
+                pbar.set_description(f'loss: {avg_loss.get_metric(): 0.4f}')
+            logger.info('Evaluating...')
+            model.eval()
+            correct = 0
+            total = 0
+            for model_inputs, labels in dev_loader:
+                model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+                labels = labels.to(device)
+                logits, *_ = model(**model_inputs)
+                _, preds = logits.max(dim=-1)
+                correct += (preds == labels.squeeze(-1)).sum().item()
+                total += labels.size(0)
+            accuracy = correct / (total + 1e-13)
+            logger.info(f'Accuracy: {accuracy : 0.4f}')
+            if accuracy > best_accuracy:
+                logger.info('Best performance so far. Saving...')
+                # torch.save(model.state_dict(), args.ckpt_dir / WEIGHTS_NAME)
+                # model.config.to_json_file(args.ckpt_dir / CONFIG_NAME)
+                model.save_pretrained(args.ckpt_dir)
+                tokenizer.save_pretrained(args.ckpt_dir)
+                best_accuracy = accuracy
+    except KeyboardInterrupt:
+        logger.info('Training manually terminated.')
+    logger.info('Testing...')
+    checkpoint = torch.load(args.ckpt_dir / WEIGHTS_NAME)
+    model.load_state_dict(checkpoint)
+    model.eval()
+    correct = 0
+    total = 0
+    for model_inputs, labels in test_loader:
+        model_inputs = {k: v.to(device) for k, v in model_inputs.items()}
+        labels = labels.to(device)
+        logits, *_ = model(**model_inputs)
+        _, preds = logits.max(dim=-1)
+        correct += (preds == labels.squeeze(-1)).sum().item()
+        total += labels.size(0)
+    accuracy = correct / (total + 1e-13)
+    logger.info(f'Accuracy: {accuracy : 0.4f}')
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--model-name', type=str)
+    parser.add_argument('--train', type=Path)
+    parser.add_argument('--dev', type=Path)
+    parser.add_argument('--test', type=Path)
+    parser.add_argument('--field-a', type=str)
+    parser.add_argument('--field-b', type=str, default=None)
+    parser.add_argument('--label-field', type=str, default='label')
+    parser.add_argument('--ckpt-dir', type=Path, default=Path('ckpt/'))
+    parser.add_argument('--num-labels', type=int, default=2)
+    parser.add_argument('--bsz', type=int, default=32)
+    parser.add_argument('--epochs', type=int, default=10)
+    parser.add_argument('--lr', type=float, default=1e-3)
+    parser.add_argument('-f', '--force-overwrite', action='store_true', default=True)
+    parser.add_argument('--debug', action='store_true')
+    parser.add_argument('--log_file', type=str, default='log.txt')
+    args = parser.parse_args()
+    if args.debug:
+        level = logging.DEBUG
+    else:
+        level = logging.INFO
+    logging.basicConfig(level=level, filename=args.log_file)
+    main(args)

autoprompt/utils.py ADDED Viewed

	@@ -0,0 +1,376 @@

+import csv
+import copy
+import json
+import logging
+import random
+from collections import defaultdict
+import torch
+from torch.nn.utils.rnn import pad_sequence
+MAX_CONTEXT_LEN = 50
+logger = logging.getLogger(__name__)
+def pad_squeeze_sequence(sequence, *args, **kwargs):
+    """Squeezes fake batch dimension added by tokenizer before padding sequence."""
+    return pad_sequence([x.squeeze(0) for x in sequence], *args, **kwargs)
+class OutputStorage:
+    """
+    This object stores the intermediate gradients of the output a the given PyTorch module, which
+    otherwise might not be retained.
+    """
+    def __init__(self, module):
+        self._stored_output = None
+        module.register_forward_hook(self.hook)
+    def hook(self, module, input, output):
+        self._stored_output = output
+    def get(self):
+        return self._stored_output
+class ExponentialMovingAverage:
+    def __init__(self, weight=0.3):
+        self._weight = weight
+        self.reset()
+    def update(self, x):
+        self._x += x
+        self._i += 1
+    def reset(self):
+        self._x = 0
+        self._i = 0
+    def get_metric(self):
+        return self._x  / (self._i + 1e-13)
+class Collator:
+    """
+    Collates transformer outputs.
+    """
+    def __init__(self, pad_token_id=0):
+        self._pad_token_id = pad_token_id
+    def __call__(self, features):
+        # Separate the list of inputs and labels
+        model_inputs, labels = list(zip(*features))
+        # Assume that all inputs have the same keys as the first
+        proto_input = model_inputs[0]
+        keys = list(proto_input.keys())
+        padded_inputs = {}
+        for key in keys:
+            if key == 'input_ids':
+                padding_value = self._pad_token_id
+            else:
+                padding_value = 0
+            # NOTE: We need to squeeze to get rid of fake batch dim.
+            sequence = [x[key] for x in model_inputs]
+            padded = pad_squeeze_sequence(sequence, batch_first=True, padding_value=padding_value)
+            padded_inputs[key] = padded
+        labels = pad_squeeze_sequence(labels, batch_first=True, padding_value=0)
+        return padded_inputs, labels
+def encode_label(tokenizer, label, tokenize=False):
+    """
+    Helper function for encoding labels. Deals with the subtleties of handling multiple tokens.
+    """
+    if isinstance(label, str):
+        if tokenize:
+            # Ensure label is properly tokenized, and only retain first token
+            # if it gets split into multiple tokens. TODO: Make sure this is
+            # desired behavior.
+            tokens = tokenizer.tokenize(label)
+            if len(tokens) > 1:
+                raise ValueError(f'Label "{label}" gets mapped to multiple tokens.')
+            if tokens[0] == tokenizer.unk_token:
+                raise ValueError(f'Label "{label}" gets mapped to unk.')
+            label = tokens[0]
+        encoded = torch.tensor(tokenizer.convert_tokens_to_ids([label])).unsqueeze(0)
+    elif isinstance(label, list):
+        encoded = torch.tensor(tokenizer.convert_tokens_to_ids(label)).unsqueeze(0)
+    elif isinstance(label, int):
+        encoded = torch.tensor([[label]])
+    return encoded
+class TriggerTemplatizer:
+    """
+    An object to facilitate creating transformers-friendly triggers inputs from a template.
+    Parameters
+    ==========
+    template : str
+        The template string, comprised of the following tokens:
+            [T] to mark a trigger placeholder.
+            [P] to mark a prediction placeholder.
+            {fields} arbitrary fields instantiated from the dataset instances.
+        For example a NLI template might look like:
+            "[T] [T] [T] {premise} [P] {hypothesis}"
+    tokenizer : PretrainedTokenizer
+        A HuggingFace tokenizer. Must have special trigger and predict tokens.
+    add_special_tokens : bool
+        Whether or not to add special tokens when encoding. Default: False.
+    """
+    def __init__(self,
+                 template,
+                 config,
+                 tokenizer,
+                 label_field='label',
+                 label_map=None,
+                 tokenize_labels=False,
+                 add_special_tokens=False,
+                 use_ctx=False):
+        if not hasattr(tokenizer, 'predict_token') or \
+           not hasattr(tokenizer, 'trigger_token'):
+            raise ValueError(
+                'Tokenizer missing special trigger and predict tokens in vocab.'
+                'Use `utils.add_special_tokens` to add them.'
+            )
+        self._template = template
+        self._config = config
+        self._tokenizer = tokenizer
+        self._label_field = label_field
+        self._label_map = label_map
+        self._tokenize_labels = tokenize_labels
+        self._add_special_tokens = add_special_tokens
+        self._use_ctx = use_ctx
+    @property
+    def num_trigger_tokens(self):
+        return sum(token == '[T]' for token in self._template.split())
+    def __call__(self, format_kwargs):
+        # Format the template string
+        format_kwargs = format_kwargs.copy()
+        label = format_kwargs.pop(self._label_field)
+        text = self._template.format(**format_kwargs)
+        if label is None:
+            raise Exception(f'Bad data: {text}')
+        # Have the tokenizer encode the text and process the output to:
+        # - Create a trigger and predict mask
+        # - Replace the predict token with a mask token
+        model_inputs = self._tokenizer.encode_plus(
+            text,
+            add_special_tokens=self._add_special_tokens,
+            return_tensors='pt'
+        )
+        input_ids = model_inputs['input_ids']
+        trigger_mask = input_ids.eq(self._tokenizer.trigger_token_id)
+        predict_mask = input_ids.eq(self._tokenizer.predict_token_id)
+        input_ids[predict_mask] = self._tokenizer.mask_token_id
+        model_inputs['trigger_mask'] = trigger_mask
+        model_inputs['predict_mask'] = predict_mask
+        # For relation extraction with BERT, update token_type_ids to reflect the two different sequences
+        if self._use_ctx and self._config.model_type == 'bert':
+            sep_token_indices = (input_ids.squeeze(0) == self._tokenizer.convert_tokens_to_ids(self._tokenizer.sep_token)).nonzero().flatten()
+            sequence_b_indices = torch.arange(sep_token_indices[0], sep_token_indices[1] + 1).long().unsqueeze(0)
+            model_inputs['token_type_ids'].scatter_(1, sequence_b_indices, 1)
+        # Encode the label(s)
+        if self._label_map is not None:
+            label = self._label_map[label]
+        label_id = encode_label(
+            tokenizer=self._tokenizer,
+            label=label,
+            tokenize=self._tokenize_labels
+        )
+        return model_inputs, label_id
+def add_task_specific_tokens(tokenizer):
+    tokenizer.add_special_tokens({
+        'additional_special_tokens': ['[T]', '[P]', '[Y]']
+    })
+    tokenizer.trigger_token = '[T]'
+    tokenizer.trigger_token_id = tokenizer.convert_tokens_to_ids('[T]')
+    tokenizer.predict_token = '[P]'
+    tokenizer.predict_token_id = tokenizer.convert_tokens_to_ids('[P]')
+    # NOTE: BERT and RoBERTa tokenizers work properly if [X] is not a special token...
+    # tokenizer.lama_x = '[X]'
+    # tokenizer.lama_x_id = tokenizer.convert_tokens_to_ids('[X]')
+    tokenizer.lama_y = '[Y]'
+    tokenizer.lama_x_id = tokenizer.convert_tokens_to_ids('[Y]')
+def load_tsv(fname):
+    with open(fname, 'r') as f:
+        reader = csv.DictReader(f, delimiter='\t')
+        for row in reader:
+            yield row
+def load_jsonl(fname):
+    with open(fname, 'r') as f:
+        for line in f:
+            yield json.loads(line)
+LOADERS = {
+    '.tsv': load_tsv,
+    '.jsonl': load_jsonl
+}
+def load_trigger_dataset(fname, templatizer, use_ctx, limit=None):
+    loader = LOADERS[fname.suffix]
+    instances = []
+    for x in loader(fname):
+        try:
+            if use_ctx:
+                # For relation extraction, skip facts that don't have context sentence
+                if 'evidences' not in x:
+                    logger.warning('Skipping RE sample because it lacks context sentences: {}'.format(x))
+                    continue
+                evidences = x['evidences']
+                # Randomly pick a context sentence
+                obj_surface, masked_sent = random.choice([(evidence['obj_surface'], evidence['masked_sentence']) for evidence in evidences])
+                words = masked_sent.split()
+                if len(words) > MAX_CONTEXT_LEN:
+                    # If the masked sentence is too long, use the first X tokens. For training we want to keep as many samples as we can.
+                    masked_sent = ' '.join(words[:MAX_CONTEXT_LEN])
+                # If truncated context sentence still has MASK, we need to replace it with object surface
+                # We explicitly use [MASK] because all TREx fact's context sentences use it
+                context = masked_sent.replace('[MASK]', obj_surface)
+                x['context'] = context
+                model_inputs, label_id = templatizer(x)
+            else:
+                model_inputs, label_id = templatizer(x)
+        except ValueError as e:
+            logger.warning('Encountered error "%s" when processing "%s".  Skipping.', e, x)
+            continue
+        else:
+            instances.append((model_inputs, label_id))
+    if limit:
+        return random.sample(instances, limit)
+    else:
+        return instances
+def load_augmented_trigger_dataset(fname, templatizer, limit=None):
+    loader = LOADERS[fname.suffix]
+    instances = []
+    # For augmented relation extraction, we need to replace obj_label with another obj_label, and replace obj_surface with a surface form of the new obj_label
+    unique_objs_dict = defaultdict(list)
+    # Also for augmented relation extraction, we need to accumulate all facts and process them afterwards
+    facts = []
+    for x in loader(fname):
+        try:
+            sub_label = x['sub_label']
+            obj_label = x['obj_label']
+            # For relation extraction, skip facts that don't have context sentence
+            if 'evidences' not in x:
+                logger.warning('Skipping RE sample because it lacks context sentences: {}'.format(x))
+                continue
+            evidences = x['evidences']
+            # Gather all UNIQUE objects and their surface forms if its augmented relation extraction
+            for evidence in evidences:
+                obj_surface = evidence['obj_surface']
+                masked_sent = evidence['masked_sentence']
+                unique_objs_dict[obj_label].append(obj_surface)
+            # Randomly pick a context sentence
+            obj_surface, masked_sent = random.choice([(evidence['obj_surface'], evidence['masked_sentence']) for evidence in evidences])
+            words = masked_sent.split()
+            if len(words) > MAX_CONTEXT_LEN:
+                # If the masked sentence is too long, use the first X tokens. For training we want to keep as many samples as we can.
+                masked_sent = ' '.join(words[:MAX_CONTEXT_LEN])
+            x['context'] = masked_sent
+            facts.append(x)
+        except ValueError as e:
+            logger.warning('Encountered error "%s" when processing "%s".  Skipping.', e, x)
+    # Go through all facts and replace each object with a new one. Also insert the new object (surface form) into the masked sentence
+    synth_facts = []
+    for fact in facts:
+        sub_label = fact['sub_label']
+        obj_label = fact['obj_label']
+        masked_sent = fact['context']
+        # print('Original fact: ({}, {}, {})'.format(sub_label, obj_label, masked_sent))
+        synth_obj_label = random.choice([x for x in unique_objs_dict.keys() if x != obj_label])
+        synth_obj_surface = random.choice(unique_objs_dict[synth_obj_label])
+        synth_ctx = masked_sent.replace('[MASK]', synth_obj_surface)
+        # print('Synthetic fact: ({}, {}, {})\n'.format(sub_label, synth_obj_label, synth_ctx))
+        # Reassign the labels and context sentence
+        synth_fact = copy.deepcopy(fact)
+        synth_fact['sub_label'] = sub_label
+        synth_fact['obj_label'] = synth_obj_label
+        synth_fact['context'] = synth_ctx
+        synth_facts.append(synth_fact)
+    # Go through facts, templatize each one, then append them to instances
+    for fact in synth_facts:
+        model_inputs, label_id = templatizer(fact)
+        instances.append((model_inputs, label_id))
+    if limit:
+        return random.sample(instances, limit)
+    else:
+        return instances
+def load_classification_dataset(
+    fname,
+    tokenizer,
+    input_field_a,
+    input_field_b=None,
+    label_field='label',
+    label_map=None,
+    limit=None
+):
+    """
+    Loads a dataset for classification
+    Parameters
+    ==========
+    tokenizer : transformers.PretrainedTokenizer
+        Maps text to id tensors.
+    sentence1 :
+    """
+    instances = []
+    label_map = label_map or {}
+    loader = LOADERS[fname.suffix]
+    for instance in loader(fname):
+        logger.debug(instance)
+        model_inputs = tokenizer.encode_plus(
+            instance[input_field_a],
+            instance[input_field_b] if input_field_b else None,
+            add_special_tokens=True,
+            # add_prefix_space=True,
+            return_tensors='pt'
+        )
+        logger.debug(model_inputs)
+        label = instance[label_field]
+        if label not in label_map:
+            label_map[label] = len(label_map)
+        label_id = label_map[label]
+        label_id = torch.tensor([[label_id]])  # To make collator expectation
+        logger.debug(f'Label id: {label_id}')
+        instances.append((model_inputs, label_id))
+    if limit:
+        instances = random.sample(instances, limit)
+    return instances, label_map

prompts/fact_retrieval_bert_prompts.jsonl ADDED Viewed

	@@ -0,0 +1,41 @@

+{"relation": "P1001", "template": "[X]vik nationwide disabilities policing within [Y]."}
+{"relation": "P101", "template": "[X] probability earliest fame totaled studying [Y]."}
+{"relation": "P103", "template": "[X]PA communerug speaks proper [Y]."}
+{"relation": "P106", "template": "[X] supporters studied politicians musician turned [Y]."}
+{"relation": "P108", "template": "[X] 1987adeNBC computing succeeded [Y]."}
+{"relation": "P127", "template": "[X] is hindwings mainline architecture within [Y]."}
+{"relation": "P1303", "template": "[X] playingdrum concertoative electric [Y]."}
+{"relation": "P131", "template": "[X]ediatric close suburb throughout northwest [Y]."}
+{"relation": "P136", "template": "[X] freaking genre orchestra fiction acid [Y]."}
+{"relation": "P1376", "template": "[X] boasts native territory traditionally called [Y]."}
+{"relation": "P138", "template": "[X] consistslanche classical name of [Y]."}
+{"relation": "P140", "template": "[X]urn openly explicitly mosques practicing [Y]."}
+{"relation": "P1412", "template": "[X] receivedorganisation 1904 speaking only [Y]."}
+{"relation": "P159", "template": "[X] isnky galleries headquartered in [Y]."}
+{"relation": "P17", "template": "[X] is association footballled southeastern [Y]."}
+{"relation": "P176", "template": "[X] was flight series manufactured by [Y]."}
+{"relation": "P178", "template": "[X] is memory arcade branding by [Y]."}
+{"relation": "P19", "template": "[X] clocks literary economist relocated to [Y]."}
+{"relation": "P190", "template": "[X] proceeded worldwidedick offices near [Y]."}
+{"relation": "P20", "template": "[X] reorganizationotype photographic studio in [Y]."}
+{"relation": "P264", "template": "[X] cameo explanation\u00f6table sued [Y]."}
+{"relation": "P27", "template": "[X] m\u00b3 badminton pieces internationally representing [Y]."}
+{"relation": "P276", "template": "[X] consists kilograms centred neighborhoods in [Y]."}
+{"relation": "P279", "template": "[X] is \u00ee adequately termed coated [Y]."}
+{"relation": "P30", "template": "[X] is commune polar continent in [Y]."}
+{"relation": "P31", "template": "[X] isious 1970s southwardlier [Y]."}
+{"relation": "P36", "template": "[X] includesiidae geologic countryside near [Y]."}
+{"relation": "P361", "template": "[X] isaul archaic section of [Y]."}
+{"relation": "P364", "template": "[X]dak \u20ac dancers speak standard [Y]."}
+{"relation": "P37", "template": "[X]inen dialects resembled officially exclusively [Y]."}
+{"relation": "P39", "template": "[X] explorers voting municipal \u2192 consecrated [Y]."}
+{"relation": "P407", "template": "[X] playedi\u0107 every dialect but [Y]."}
+{"relation": "P413", "template": "[X] played colors skier \u2194 defensive [Y]."}
+{"relation": "P449", "template": "[X] uncredited recording remake aired on [Y]."}
+{"relation": "P463", "template": "[X] splits artisticlogy prior joining [Y]."}
+{"relation": "P47", "template": "[X] shares undrafted border northeast neighbours [Y]."}
+{"relation": "P495", "template": "[X] album spanninggie chart in [Y]."}
+{"relation": "P527", "template": "[X] nickname involves \u032f\u00bddized [Y]."}
+{"relation": "P530", "template": "[X] nightclubrah preceding relations with [Y]."}
+{"relation": "P740", "template": "[X] refers drum blog centred downtown [Y]."}
+{"relation": "P937", "template": "[X] vol \u300elson gallery in [Y]."}

prompts/fact_retrieval_roberta_prompts.jsonl ADDED Viewed

	@@ -0,0 +1,41 @@

+{"relation": "P1001", "template": " [X]\u00a2List unsu rivers spanning [Y] ."}
+{"relation": "P101", "template": " [X] 1830 dissertation applying mathsucci [Y] ."}
+{"relation": "P103", "template": " [X]neau optionally fluent!?\" traditional [Y] ."}
+{"relation": "P106", "template": " [X] (), astronomers businessman\u00b7former [Y] ."}
+{"relation": "P108", "template": " [X] heads opio computer divisionersen [Y] ."}
+{"relation": "P127", "template": " [X] picThom unwillingness officially governs [Y] ."}
+{"relation": "P1303", "template": " [X]Trump learned soloKeefe classical [Y] ."}
+{"relation": "P131", "template": " [X] scenic neighbourhood occurred enqu northeastern [Y] ."}
+{"relation": "P136", "template": " [X] blends postwar hostage drama sax [Y] ."}
+{"relation": "P1376", "template": " [X] limestone depositedati boroughDepending [Y] ."}
+{"relation": "P138", "template": " [X] =alysis northern spellingSaint [Y] ."}
+{"relation": "P140", "template": " [X] traced pagan fascism individuality extremist [Y] ."}
+{"relation": "P1412", "template": " [X] translatedANCauld writings binaries [Y] ."}
+{"relation": "P159", "template": " [X] spinsCompany organisedLocation near [Y] ."}
+{"relation": "P17", "template": " [X]exec scenic provinces iodine northeastern [Y] ."}
+{"relation": "P176", "template": " [X] 125definition enormously stunned manufacturer [Y] ."}
+{"relation": "P178", "template": " [X] 1987 floppy simulator users sued [Y] ."}
+{"relation": "P19", "template": " [X] 2002 protesting disco constructionamine [Y] ."}
+{"relation": "P190", "template": " [X] flight facultiesyna arrivesfolios [Y] ."}
+{"relation": "P20", "template": " [X].. enigmatic twentieth nowadays near [Y] ."}
+{"relation": "P264", "template": " [X] touring 1958 defunct videog label [Y] ."}
+{"relation": "P27", "template": " [X] offic organise forests statutes northwestern [Y] ."}
+{"relation": "P276", "template": " [X] manoeuv constructs whistleblowers hills near [Y] ."}
+{"relation": "P279", "template": " [X],formerly prayers unstaceous [Y] ."}
+{"relation": "P30", "template": " [X] coral caves symb polar zone [Y] ."}
+{"relation": "P31", "template": " [X] (), therapists nationallyrecorded enchanted [Y] ."}
+{"relation": "P36", "template": " [X] 1954 misinterpretburg narrowly battered [Y] ."}
+{"relation": "P361", "template": " [X], supplementaryfoot structuresNorthern [Y] ."}
+{"relation": "P364", "template": " [X]vanathering preferred languagesEnglish [Y] ."}
+{"relation": "P37", "template": " [X]onen tribes descending speak mainly [Y] ."}
+{"relation": "P39", "template": " [X] billionaire elected unp\u200b\u200bCatholic [Y] ."}
+{"relation": "P407", "template": " [X] scaven pronunciation.*Wikipedia speaks [Y] ."}
+{"relation": "P413", "template": " [X],'' (), ex-,Liverpool [Y] ."}
+{"relation": "P449", "template": " [X] premiered 1989 simulatively instinctively [Y] ."}
+{"relation": "P463", "template": " [X] joins reformedolitical endangered grouping [Y] ."}
+{"relation": "P47", "template": " [X] combinesfill marry territory surrounding [Y] ."}
+{"relation": "P495", "template": " [X] condom announces manufacturer residence exported [Y] ."}
+{"relation": "P527", "template": " [X] minus asylum cooked = compressed [Y] ."}
+{"relation": "P530", "template": " [X]varOriginally kidnappedstrate neighboring [Y] ."}
+{"relation": "P740", "template": " [X] prefersLondon whilst 182 favors [Y] ."}
+{"relation": "P937", "template": " [X] bicycles investments railway neighborhoodAlternatively [Y] ."}

prompts/relation_extraction_bert_prompts.jsonl ADDED Viewed

	@@ -0,0 +1,39 @@

+{"relation": "P1001", "template": "[X] dispatched state consul federally to [Y]."}
+{"relation": "P101", "template": "[X]icidalology fascinated textbook on [Y]."}
+{"relation": "P103", "template": "[X] sent literary visa speaking predominantly [Y]."}
+{"relation": "P106", "template": "[X] as invented firstractical aspiring [Y]."}
+{"relation": "P108", "template": "[X] funded transmissions business involvement at [Y]."}
+{"relation": "P127", "template": "[X] sentuti limo sponsorship to [Y]."}
+{"relation": "P1303", "template": "[X] ] podcast 1935 practices unison [Y]."}
+{"relation": "P131", "template": "[X] fewer congressional consul corporation bordering [Y]."}
+{"relation": "P136", "template": "[X] drama bacteriatitled 80s cosmic [Y]."}
+{"relation": "P138", "template": "[X] positively cited the town nicknamed [Y]."}
+{"relation": "P140", "template": "[X] 2006 revelation convertedtsky practiced [Y]."}
+{"relation": "P1412", "template": "[X] imported colleges translations exports speak [Y]."}
+{"relation": "P159", "template": "[X]rica headquartered town across from [Y]."}
+{"relation": "P17", "template": "[X] constituteronological country embassy to [Y]."}
+{"relation": "P176", "template": "[X] became plays sponsor co with [Y]."}
+{"relation": "P178", "template": "[X] game handed showcased separately by [Y]."}
+{"relation": "P19", "template": "[X]lancheheim grew house in [Y]."}
+{"relation": "P190", "template": "[X] attended waived both cities including [Y]."}
+{"relation": "P20", "template": "[X]rseyjee maintained apartment in [Y]."}
+{"relation": "P264", "template": "[X] became commemorated label label succeeding [Y]."}
+{"relation": "P27", "template": "[X] country goals diaspora diplomat visited [Y]."}
+{"relation": "P276", "template": "[X] visited crore sister town to [Y]."}
+{"relation": "P279", "template": "[X]districtutical\u00e8ne word resembling [Y]."}
+{"relation": "P30", "template": "[X] subfamily pardon globallyinae throughout [Y]."}
+{"relation": "P31", "template": "[X] nm charitiespository nicknamed underwater [Y]."}
+{"relation": "P36", "template": "[X] sued wraps owner city of [Y]."}
+{"relation": "P361", "template": "[X] passwordU emblem inspired by [Y]."}
+{"relation": "P364", "template": "[X] translated mistress culturally language notably [Y]."}
+{"relation": "P37", "template": "[X] called countries speaking originually [Y]."}
+{"relation": "P39", "template": "[X]lina \u2500 fifteenthously supreme [Y]."}
+{"relation": "P407", "template": "[X] sent vocalist languages foreign especially [Y]."}
+{"relation": "P413", "template": "[X] acts sentiment rookie minimum prone [Y]."}
+{"relation": "P449", "template": "[X] novels channel similarly also joined [Y]."}
+{"relation": "P463", "template": "[X] member testified frontman founded also [Y]."}
+{"relation": "P47", "template": "[X] became consulate will include [Y]."}
+{"relation": "P495", "template": "[X] shows website country abroad includes [Y]."}
+{"relation": "P530", "template": "[X] send globally dedicated embassy to [Y]."}
+{"relation": "P740", "template": "[X] music compliment residents resident in [Y]."}
+{"relation": "P937", "template": "[X] described courthouse residency career near [Y]."}

prompts/relation_extraction_roberta_prompts.jsonl ADDED Viewed

	@@ -0,0 +1,39 @@

+{"relation": "P1001", "template": "[X] congratulated killers counterparts residing outage [Y] ."}
+{"relation": "P101", "template": "[X]itations illustratingModern\u2010 risked [Y] ."}
+{"relation": "P103", "template": "[X] website canceled learn languageposition [Y] ."}
+{"relation": "P106", "template": "[X]Officersoglu internationally renown trained [Y] ."}
+{"relation": "P108", "template": "[X] culinary \u00a9 fixtures file courtesy [Y] ."}
+{"relation": "P127", "template": "[X] proudly celebrating playthrough ties cultured [Y] ."}
+{"relation": "P1303", "template": "[X] gubernatorial 410 antique vibritone [Y] ."}
+{"relation": "P131", "template": "[X] \"{\">phys town bordering [Y] ."}
+{"relation": "P136", "template": "[X] poking maneuvers genre synonymous baseline [Y] ."}
+{"relation": "P138", "template": "[X] slideshow painting spelling homage ()); [Y] ."}
+{"relation": "P140", "template": "[X] modified kosher spiritualitycert imitate [Y] ."}
+{"relation": "P1412", "template": "[X] translating pled spoken callback fluent [Y] ."}
+{"relation": "P159", "template": "[X] hometown bonding hahaVisit downtown [Y] ."}
+{"relation": "P17", "template": "[X] embassy factual diplomatic ambassadorooked [Y] ."}
+{"relation": "P176", "template": "[X] sponsorship respectfully complimentary courtesy fuckin [Y] ."}
+{"relation": "P178", "template": "[X] wikiPlanetSOURCE sponsored reckon [Y] ."}
+{"relation": "P19", "template": "[X] slideshow referencing correctness hometown continent [Y] ."}
+{"relation": "P190", "template": "[X] planetaking luggage transfer reaching [Y] ."}
+{"relation": "P20", "template": "[X] ironically resided located recalling downtown [Y] ."}
+{"relation": "P264", "template": "[X] claims primary label membershipdisc [Y] ."}
+{"relation": "P27", "template": "[X] smugglers smuggled davidjl forcibly affordability [Y] ."}
+{"relation": "P276", "template": "[X] photographed>:Folder cliffs overlooking [Y] ."}
+{"relation": "P279", "template": "[X]enez sculpture disguised mailboxSensor [Y] ."}
+{"relation": "P30", "template": "[X] tropical continent tropicalmessageoutheastern [Y] ."}
+{"relation": "P31", "template": "[X] bullies campuses hypothetical substitutiononic [Y] ."}
+{"relation": "P36", "template": "[X] border*.NOWVisit downtown [Y] ."}
+{"relation": "P361", "template": "[X] ~/FlickrFORE blessing representing [Y] ."}
+{"relation": "P364", "template": "[X] population language predomin smoker installer [Y] ."}
+{"relation": "P37", "template": "[X] screamed visibly fluent descendants nutrients [Y] ."}
+{"relation": "P39", "template": "[X] slideshow photo\u30aa workforce appointed [Y] ."}
+{"relation": "P407", "template": "[X] screened pioneering documentaries translated curry [Y] ."}
+{"relation": "P413", "template": "[X] learnedailed springTherefore veteran [Y] ."}
+{"relation": "P449", "template": "[X] slideshow courtesy recommendation television broadcaster [Y] ."}
+{"relation": "P463", "template": "[X] facebook referencing summarizes monikerTeam [Y] ."}
+{"relation": "P47", "template": "[X] aquatic contacted diplomatic consulate imperialist [Y] ."}
+{"relation": "P495", "template": "[X] webpage highlighting cultural exportsrero [Y] ."}
+{"relation": "P530", "template": "[X]ECD establishes diplomatic ties fut [Y] ."}
+{"relation": "P740", "template": "[X]empt adjoining merchants utilized downtown [Y] ."}
+{"relation": "P937", "template": "[X]\u00df died\"\" 1931 barbecue [Y] ."}

pytest.ini ADDED Viewed

	@@ -0,0 +1,5 @@

+[pytest]
+testpaths = tests/
+pythonpath = ./
+log_format = %(asctime)s - %(levelname)s - %(name)s - %(message)s
+log_level = DEBUG

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+streamlit==0.79.0
+tqdm==4.49.0
+pandas==1.2.1
+numpy==1.17.2
+torch==1.4.0
+transformers==2.9.1
+spacy==2.2.0
+termcolor==1.1.0
+colorama==0.4.1
+matplotlib==3.1.1
+pytest

scripts/run_fact_retrieval_example.sh ADDED Viewed

	@@ -0,0 +1,32 @@

+#!/bin/bash
+# Experiment 8
+# Task: fact retrieval
+# Model: RoBERTa
+# Batch sizes: 56
+# Iters: 1000
+# Filtering: True
+datadir=$1
+logfile=$2
+# Clear files
+cat /dev/null > $logfile
+cat /dev/null > ${logfile}.log
+for path in $datadir/*; do
+    filename=$(basename "$path")
+    time CUDA_VISIBLE_DEVICES=3 python -m autoprompt.create_trigger \
+        --train $path/train.jsonl \
+        --dev $path/dev.jsonl \
+        --template '<s> {sub_label} [T] [T] [T] [T] [T] [P] . </s>' \
+        --num-cand 10 \
+        --accumulation-steps 1 \
+        --model-name roberta-large \
+        --bsz 56 \
+        --eval-size 56 \
+        --iters 1000 \
+        --label-field 'obj_label' \
+        --tokenize-labels \
+        --filter \
+        --print-lama >> $logfile 2>> ${logfile}.log
+done

scripts/run_relation_extraction_example.sh ADDED Viewed

	@@ -0,0 +1,33 @@

+#!/bin/bash
+# Experiment 9
+# Task: relation extraction
+# Model: BERT
+# Batch size: 32
+# Iters: 500
+# Filtering: True
+datadir=$1
+logfile=$2
+# Clear files
+cat /dev/null > $logfile
+cat /dev/null > ${logfile}.log
+for path in $datadir/*; do
+    filename=$(basename "$path")
+    time CUDA_VISIBLE_DEVICES=4 python -m autoprompt.create_trigger \
+        --train $path/train.jsonl \
+        --dev $path/dev.jsonl \
+        --template '[CLS] {context} [SEP] {sub_label} [T] [T] [T] [T] [T] [P] . [SEP]' \
+        --num-cand 10 \
+        --accumulation-steps 1 \
+        --model-name bert-base-cased \
+        --bsz 32 \
+        --eval-size 32 \
+        --iters 500 \
+        --label-field 'obj_label' \
+        --tokenize-labels \
+        --filter \
+        --print-lama \
+        --use-ctx >> $logfile 2>> ${logfile}.log
+done

setup.py ADDED Viewed

	@@ -0,0 +1,29 @@

+import os
+import setuptools
+import sys
+# Load README to get long description.
+with open('README.md') as f:
+    _LONG_DESCRIPTION = f.read()
+setuptools.setup(
+    name='autoprompt',
+    version='0.0.1',
+    description='AutoPrompt',
+    long_description=_LONG_DESCRIPTION,
+    long_description_content_type='text/markdown',
+    author='UCI NLP',
+    url='https://github.com/ucinlp/autoprompt',
+    packages=setuptools.find_packages(),
+    install_requires=[ ],
+    extras_require={
+        'test': ['pytest']
+    },
+    classifiers=[
+        'Intended Audience :: Science/Research',
+        'Topic :: Scientific/Engineering :: Artificial Intelligence',
+    ],
+    keywords='text nlp machinelearning',
+)

tests/test_create_trigger.py ADDED Viewed

	@@ -0,0 +1,63 @@

+from unittest import TestCase
+import torch
+from transformers import AutoConfig, AutoModelWithLMHead, AutoTokenizer
+import autoprompt.create_trigger as ct
+def _load(model_name):
+    config = AutoConfig.from_pretrained('bert-base-cased')
+    model = AutoModelWithLMHead.from_pretrained('bert-base-cased')
+    tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
+    return config, model, tokenizer
+class TestGetEmbeddings(TestCase):
+    def test_bert(self):
+        model_name = 'bert-base-cased'
+        config, model, tokenizer = _load(model_name)
+        embeddings = ct.get_embeddings(model, config)
+        self.assertEqual(embeddings.weight.shape[0], config.vocab_size)
+    def test_roberta(self):
+        model_name = 'roberta-base'
+        config, model, tokenizer = _load(model_name)
+        embeddings = ct.get_embeddings(model, config)
+        self.assertEqual(embeddings.weight.shape[0], config.vocab_size)
+class TestGradientStorage(TestCase):
+    def test_gradient_storage(self):
+        num_embeddings = 3
+        embedding_dim = 4
+        embeddings = torch.nn.Embedding(num_embeddings, embedding_dim)
+        embedding_storage = ct.GradientStorage(embeddings)
+        inputs = torch.tensor([0, 1, 2, 1])
+        outputs = embeddings(inputs)
+        outputs.retain_grad()
+        loss = outputs.sum()
+        loss.backward()
+        assert torch.equal(outputs.grad, embedding_storage.get())
+def test_replace_trigger_tokens():
+    model_inputs = {
+        'input_ids': torch.tensor([
+            [1, 2, 3, 4],
+            [1, 1, 1, 0]
+        ])
+    }
+    trigger_ids = torch.tensor([[5, 6]])
+    trigger_mask = torch.tensor([
+            [True, True, False, False],
+            [False, True, False, True]
+    ])
+    replaced = ct.replace_trigger_tokens(model_inputs, trigger_ids, trigger_mask)
+    expected = torch.tensor([
+        [5, 6, 3, 4],
+        [1, 5, 1, 6]
+    ])
+    assert torch.equal(expected, replaced['input_ids'])

tests/test_utils.py ADDED Viewed

	@@ -0,0 +1,159 @@

+from unittest import TestCase
+import torch
+from torch.utils.data import DataLoader
+from transformers import AutoConfig, AutoTokenizer
+import autoprompt.utils as utils
+class TestEncodeLabel(TestCase):
+    def setUp(self):
+        self._tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
+    def test_single_token(self):
+        output = utils.encode_label(self._tokenizer, 'the')
+        expected_output = torch.tensor([self._tokenizer.convert_tokens_to_ids(['the'])])
+        assert torch.equal(output, expected_output)
+    def test_multiple_tokens(self):
+        output = utils.encode_label(self._tokenizer, ['a', 'the'])
+        expected_output = torch.tensor([
+            self._tokenizer.convert_tokens_to_ids(['a', 'the'])
+        ])
+        assert torch.equal(output, expected_output)
+class TestTriggerTemplatizer(TestCase):
+    def setUp(self):
+        self.default_template = '[T] [T] {arbitrary} [T] {fields} [P]'
+        self.default_config = AutoConfig.from_pretrained('bert-base-cased')
+        self.default_tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
+        utils.add_task_specific_tokens(self.default_tokenizer)
+        self.default_instance = {
+            'arbitrary': 'does this',
+            'fields': 'work',
+            'label': 'and'
+        }
+    def test_bert(self):
+        templatizer = utils.TriggerTemplatizer(
+            self.default_template,
+            self.default_config,
+            self.default_tokenizer,
+            add_special_tokens=False
+        )
+        model_inputs, label = templatizer(self.default_instance)
+        # Label should be mapped to its token id
+        expected_label = torch.tensor([self.default_tokenizer.convert_tokens_to_ids([self.default_instance['label']])])
+        assert torch.equal(expected_label, label)
+        # For BERT ouput is expected to have the following keys
+        assert 'input_ids' in model_inputs
+        assert 'token_type_ids' in model_inputs
+        assert 'attention_mask' in model_inputs
+        # Test that the custom masks match our expectations
+        expected_trigger_mask = torch.tensor(
+            [[True, True, False, False, True, False, False]]
+        )
+        assert torch.equal(expected_trigger_mask, model_inputs['trigger_mask'])
+        expected_predict_mask = torch.tensor(
+            [[False, False, False, False, False, False, True]]
+        )
+        assert torch.equal(expected_predict_mask, model_inputs['predict_mask'])
+        # Lastly, ensure [P] is replaced by a [MASK] token
+        input_ids = model_inputs['input_ids']
+        predict_mask = model_inputs['predict_mask']
+        predict_token_id = input_ids[predict_mask].squeeze().item()
+        assert predict_token_id == self.default_tokenizer.mask_token_id
+    def test_roberta(self):
+        config = AutoConfig.from_pretrained('roberta-base')
+        tokenizer = AutoTokenizer.from_pretrained('roberta-base')
+        utils.add_task_specific_tokens(tokenizer)
+        templatizer = utils.TriggerTemplatizer(
+            self.default_template,
+            config,
+            tokenizer,
+            add_special_tokens=False
+        )
+        model_inputs, label = templatizer(self.default_instance)
+        # Label should be mapped to its token id
+        expected_label = torch.tensor([tokenizer.convert_tokens_to_ids([self.default_instance['label']])])
+        assert torch.equal(expected_label, label)
+        # For BERT ouput is expected to have the following keys
+        print(model_inputs)
+        assert 'input_ids' in model_inputs
+        assert 'attention_mask' in model_inputs
+        # Test that the custom masks match our expectations
+        expected_trigger_mask = torch.tensor(
+            [[True, True, False, False, True, False, False]]
+        )
+        assert torch.equal(expected_trigger_mask, model_inputs['trigger_mask'])
+        expected_predict_mask = torch.tensor(
+            [[False, False, False, False, False, False, True]]
+        )
+        assert torch.equal(expected_predict_mask, model_inputs['predict_mask'])
+        # Lastly, ensure [P] is replaced by a [MASK] token
+        input_ids = model_inputs['input_ids']
+        predict_mask = model_inputs['predict_mask']
+        predict_token_id = input_ids[predict_mask].squeeze().item()
+        assert predict_token_id == tokenizer.mask_token_id
+class TestCollator(TestCase):
+    def test_collator(self):
+        template = '[T] [T] {arbitrary} [T] {fields} [P]'
+        tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
+        config = AutoConfig.from_pretrained('bert-base-cased')
+        utils.add_task_specific_tokens(tokenizer)
+        templatizer = utils.TriggerTemplatizer(
+            template,
+            config,
+            tokenizer,
+            add_special_tokens=False
+        )
+        collator = utils.Collator(pad_token_id=tokenizer.pad_token_id)
+        instances = [
+            {'arbitrary': 'a', 'fields': 'the', 'label': 'hot'},
+            {'arbitrary': 'a a', 'fields': 'the the', 'label': 'cold'}
+        ]
+        templatized_instances = [templatizer(x) for x in instances]
+        loader = DataLoader(
+            templatized_instances,
+            batch_size=2,
+            shuffle=False,
+            collate_fn=collator
+        )
+        model_inputs, labels = next(iter(loader))
+        # Check results match our expectations
+        expected_labels = torch.tensor([
+            tokenizer.encode('hot', add_special_tokens=False, add_prefix_space=True),
+            tokenizer.encode('cold', add_special_tokens=False, add_prefix_space=True),
+        ])
+        assert torch.equal(expected_labels, labels)
+        expected_trigger_mask = torch.tensor([
+            [True, True, False, True, False, False, False, False],
+            [True, True, False, False, True, False, False, False],
+        ])
+        assert torch.equal(expected_trigger_mask, model_inputs['trigger_mask'])
+        expected_predict_mask = torch.tensor([
+            [False, False, False, False, False, True, False, False],
+            [False, False, False, False, False, False, False, True],
+        ])
+        assert torch.equal(expected_predict_mask, model_inputs['predict_mask'])