--- language: en license: mit tags: - pytorch - causal-lm - language-model - flash-attention datasets: - Salesforce/wikitext pipeline_tag: question-answering --- # PurelyUnfunctionalAI/GibberishGPT A lightweight decoder-only transformer language model trained with Flash Attention on the WikiText dataset. This is a version used for learning about training LLMs and ML pipelines. The model does not actually output coherent text, although serves as a good starting point for learning more about LLMs GitHub ## Model Details - **Model Type:** Causal Language Model - **Architecture:** Decoder-only Transformer - **Embedding Size:** 512 - **Hidden Layers:** 8 - **Attention Heads:** 8 - **Context Length:** 512 - **Flash Attention:** Enabled - **Training Data:** Salesforce/wikitext ## Usage ```python import torch import tiktoken from transformers import AutoModelForCausalLM # Load the tokenizer tokenizer = tiktoken.get_encoding("gpt2") # Load the model model = AutoModelForCausalLM.from_pretrained("PurelyUnfunctionalAI/GibberishGPT") # Encode input input_text = "Your prompt here" input_ids = tokenizer.encode(input_text) input_tensor = torch.tensor([input_ids], dtype=torch.long) # Generate output = model.generate(input_tensor, max_length=100) generated_text = tokenizer.decode(output[0].tolist()) print(generated_text) ``` # Limitations - The model has a context length of 512 tokens - It was trained on WikiText data which may not cover specialized domains - As a lightweight model, it may not perform as well as larger LLMs on complex tasks # Citation If you use this model in your research, please cite: ``` @misc{GibberishGPT, author = {Gathara, Michael and Menon, Vaishak and Liu, Jason}, title = {GibberishGPT: A Lightweight Language Model with Flash Attention}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face model repository}, howpublished = {\url{https://huggingface.co/PurelyUnfunctionalAI/GibberishGPT}} } ```