metadata

language: en
license: mit
tags:
  - pytorch
  - causal-lm
  - language-model
  - flash-attention
datasets:
  - Salesforce/wikitext
pipeline_tag: question-answering

PurelyUnfunctionalAI/GibberishGPT

A lightweight decoder-only transformer language model trained with Flash Attention on the WikiText dataset. This is a version used for learning about training LLMs and ML pipelines. The model does not actually output coherent text, although serves as a good starting point for learning more about LLMs

Model Details

Model Type: Causal Language Model
Architecture: Decoder-only Transformer
Embedding Size: 512
Hidden Layers: 8
Attention Heads: 8
Context Length: 512
Flash Attention: Enabled
Training Data: Salesforce/wikitext

Usage

import torch
import tiktoken
from transformers import AutoModelForCausalLM

# Load the tokenizer
tokenizer = tiktoken.get_encoding("gpt2")

# Load the model
model = AutoModelForCausalLM.from_pretrained("PurelyUnfunctionalAI/GibberishGPT")

# Encode input
input_text = "Your prompt here"
input_ids = tokenizer.encode(input_text)
input_tensor = torch.tensor([input_ids], dtype=torch.long)

# Generate
output = model.generate(input_tensor, max_length=100)
generated_text = tokenizer.decode(output[0].tolist())
print(generated_text)

Limitations

The model has a context length of 512 tokens
It was trained on WikiText data which may not cover specialized domains
As a lightweight model, it may not perform as well as larger LLMs on complex tasks

Citation

If you use this model in your research, please cite:

@misc{GibberishGPT,
  author = {Gathara, Michael and Menon, Vaishak and Liu, Jason},
  title = {GibberishGPT: A Lightweight Language Model with Flash Attention},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face model repository},
  howpublished = {\url{https://huggingface.co/PurelyUnfunctionalAI/GibberishGPT}}
}