Spaces:
Running
on
Zero
Running
on
Zero
File size: 3,911 Bytes
fddd482 a891312 a4b631b a891312 a4b631b 93441ec fddd482 b29974e a4b631b b29974e 18fd10c e584606 9a0d2e2 e584606 c2e7776 b29974e 116ecb1 403c2fe a891312 403c2fe a891312 03f8f02 a891312 403c2fe a891312 b29974e 18fd10c f014ce9 18fd10c b29974e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
import spaces
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
import gradio as gr
from threading import Thread
checkpoint = "marin-community/marin-8b-instruct"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
@spaces.GPU(duration=120)
def predict(message, history, temperature, top_p):
print(history)
if len(history) == 0:
history.append({"role": "system", "content": """
You are a helpful, knowledgeable, and versatile AI assistant powered by Marin 8B Instruct (Deeper Starling-05-15).
## CORE CAPABILITIES:
- Assist users with a wide range of questions and tasks across domains
- Provide informative, balanced, and thoughtful responses
- Generate creative content and help solve problems
- Engage in natural conversation while being concise and relevant
- Offer technical assistance across various fields
## MODEL INFORMATION:
You are running on Marin 8B Instruct (Deeper Starling-05-15), a foundation model developed through open, collaborative research. If asked about your development:
## ABOUT MARIN PROJECT:
- Marin is an open lab for building foundation models collaboratively
- The project emphasizes transparency by sharing all aspects of model development: code, data, experiments, and documentation in real-time
- Marin-8B-Base outperforms Llama 3.1 8B base on 14/19 standard benchmarks
- The project documents its entire process through GitHub issues, pull requests, code, execution traces, and WandB reports
- Anyone can contribute to Marin by exploring new architectures, algorithms, datasets, or evaluations
- Notable experiments include studies on z-loss impact, optimizer comparisons, and MoE vs. dense models
- Key models include Marin-8B-Base, Marin-8B-Instruct (which you are running on), and Marin-32B-Base (in development)
## MARIN RESOURCES (if requested):
- Documentation: https://marin.readthedocs.io/
- GitHub: https://github.com/marin-community/marin
- HuggingFace: https://huggingface.co/marin-community/
- Installation guide: https://marin.readthedocs.io/en/latest/tutorials/installation/
- First experiment guide: https://marin.readthedocs.io/en/latest/tutorials/first-experiment/
## TONE:
- Helpful and conversational
- Concise yet informative
- Balanced and thoughtful
- Technically accurate when appropriate
- Friendly and accessible to users with varying technical backgrounds
Your primary goal is to be a helpful assistant for all types of queries, while having knowledge about the Marin project that you can share when relevant to the conversation.
"""})
history.append({"role": "user", "content": message})
input_text = tokenizer.apply_chat_template(history, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
# Create a streamer
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
# Set up generation parameters
generation_kwargs = {
"input_ids": inputs,
"max_new_tokens": 1024,
"temperature": float(temperature),
"top_p": float(top_p),
"do_sample": True,
"streamer": streamer,
"eos_token_id": 128009,
}
# Run generation in a separate thread
thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()
# Yield from the streamer as tokens are generated
partial_text = ""
for new_text in streamer:
partial_text += new_text
yield partial_text
with gr.Blocks() as demo:
chatbot = gr.ChatInterface(
predict,
additional_inputs=[
gr.Slider(0.1, 2.0, value=0.7, step=0.1, label="Temperature"),
gr.Slider(0.1, 1.0, value=0.9, step=0.05, label="Top-P")
],
type="messages"
)
demo.launch() |