Wur doomed!

#14

pinned

by jukofyork - opened Jan 31

Discussion

jukofyork

Owner Jan 31

Continuation of THE THREAD OF DOOM.

jukofyork

Owner Jan 31

https://www.youtube.com/watch?v=sxqvwkmTNy8

jukofyork pinned discussion Jan 31

DazzlingXeno

Feb 4

What do you and the others think of the distilled R1 models for writing?

gghfez

Feb 5

The llama3 / qwen models SFT'd on R1 outputs? I only tried 2 of them.

R1 Qwen (32b) - Lacks knowledge of fiction (same as the official Qwen release), so it's writing is no better.

R1 Llama3 - This is generally the worst of them (not just for writing). It'll generate the CoT and then write something completely different.

CoT traces won't let the model do anything out of distribution, so not very useful if the base model doesn't have a lot in it's training data.

BigHuggyD

Feb 5

Yeah, I have tried the same two and felt the same way.

I also felt that any attempt to add an R1 distill to the merge recipe of an existing merge project made it worse...so far...

DazzlingXeno

Feb 5

•

edited Feb 5

@gghfez @BigHuggyD that has been my experience as well, which is a shame as I had a go of R1 on Openrouter and I was blown away.

What model is anywhere close that is usable on a 24gb vram machine with 32gb of ram in your experience?

gghfez

Feb 5

There's nothing like it for now. I'm running R1 slowly on my ThreadRipper:

prompt eval time =   14026.61 ms /   918 tokens (   15.28 ms per token,    65.45 tokens per second)
       eval time =  398806.12 ms /  1807 tokens (  220.70 ms per token,     4.53 tokens per second)
      total time =  412832.73 ms /  2725 tokens

I tried training Wizard2 8x22b MoE on R1 data, but it doesn't really work well. It will plan ahead in think tags eg:

I need to ensure the story maintains its gritty, realistic tone without becoming overly melodramatic. The characters' growth should be subtle but significant. Also, the ending should leave a sense of hope but not be too neat—their redemption is fragile, and the future is uncertain.

Let me outline the next few chapters:

Chapter 5: Nightmares and Trust
...

But it doesn't backtrack like R1 does. Just kind of agrees with it's self and ends up writing how it usually would:

“I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead.

lol

DazzlingXeno

Feb 5

Ahhh thats a shame :-(

"I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead."

Oh god!

I'll have to keep an eye on this thread.

I did enjoy Ppoyaa/MythoNemo-L3.1-70B-v1.0

But my tastes are probably not as refined as others on this thread ;-)

281 hidden messages

Expand all

gghfez

4 days ago

The space you can use to merge LoRAs using the free "CPU only" tier is here:

Streaming the weights through the space to avoid the low storage limitation! That's a really good idea!

jukofyork

Owner 4 days ago

The space you can use to merge LoRAs using the free "CPU only" tier is here:

Streaming the weights through the space to avoid the low storage limitation! That's a really good idea!

Hopefully they don't notice... :D

jukofyork

Owner 4 days ago

Just to check:

https://huggingface.co/gghfez/creative-thinker-32b-Q4_K_M-GGUF

is from the new model I converted today and not the one from yesterday that I forgot to scale the lm_head tensor (and is now deleted)?

jukofyork

Owner 4 days ago

Just to check:

https://huggingface.co/gghfez/creative-thinker-32b-Q4_K_M-GGUF

is from the new model I converted today and not the one from yesterday that I forgot to scale the lm_head tensor (and is now deleted)?

I assume so, as the one from yesterday was called creative-thinker-32b-preview-06-2025 rather than lust creative-thinker-32b? I'm downloading it now to test if it seems the same as a local merge...

jukofyork

Owner 3 days ago

Sadly, seems broken to me :/ Will try and have a look tomorrow to see why.

gghfez

3 days ago

•

edited 3 days ago

creative-thinker-32b

Yeah it was this one. I just tested it and it seems to go off the rails (at temp=0.8 and temp=1.0) vs the creative-thinker-32b-preview-06-2025?

FYI - I just used this space: ggml-org/gguf-my-repo to create that. It works for up to 34b models.

jukofyork

Owner 3 days ago

•

edited 3 days ago

creative-thinker-32b

Yeah it was this one. I just tested it and it seems to go off the rails (at temp=0.8 and temp=1.0) vs the creative-thinker-32b-preview-06-2025?

Yeah, it might be the scaling of lm_head - I'm retrying it now locally without that.

FYI - I just used this space: ggml-org/gguf-my-repo to create that. It works for up to 34b models.

I've hidden the models until I figure out what has gone wrong so probably a good idea to hide (or just delete) the broken GGUF too.

gghfez

3 days ago

Deleted

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment