arxiv:2205.05789

RITA: a Study on Scaling Up Generative Protein Sequence Models

Published on May 11, 2022

Authors:

Abstract

RITA suite of autoregressive generative models for protein sequences promises to accelerate protein design by evaluating performance across model sizes in prediction and fitness tasks.

AI-generated summary

In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative models hold the promise of greatly accelerating protein design. We conduct the first systematic study of how capabilities evolve with model size for autoregressive transformers in the protein domain: we evaluate RITA models in next amino acid prediction, zero-shot fitness, and enzyme function prediction, showing benefits from increased scale. We release the RITA models openly, to the benefit of the research community.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2205.05789 in a dataset README.md to link it from this page.

RITA: a Study on Scaling Up Generative Protein Sequence Models

Abstract

Community

Models citing this paper 4

Datasets citing this paper 0

Spaces citing this paper 2

Collections including this paper 1