Any plans to release an updated version based on DeepSeek-V3-0526 + R1, or how to create the merge myself?

by Lissanro - opened 11 days ago

Discussion

Lissanro

11 days ago

•

edited 11 days ago

I wonder if there are any plans to release a new version that would be based on DeepSeek-V3-0526 and R1?

Or, alternatively, maybe there are instructions how can I create an updated merge myself?

I assume DeepSeek-V3-0526 still has the same architecture, so if there are instructions how to create the merge, it would help a lot to save bandwidth assuming if I already have full versions of V3-0526 and R1, to merge and quantize myself (technically, I just found out about new version at https://www.reddit.com/r/LocalLLaMA/comments/1kvpwq3/deepseek_v3_0526/ and did not yet begun downloading V3-0526, but I already have R1). So I am trying to figure out if it is better to wait for updated R1T Chimera or download V3-0526 and try to make it myself, if that is possible.

TNGHK

TNG Technology Consulting GmbH org 11 days ago

Hi, no worries. If DeepSeek releases something new, and if the technical parameters of the new release are within range, of course we will create a gaggle of variations :-). Significant testing will have to be made, because even if these are research prototypes, it's better to know then to know not.

For example, on chutes.ai alone, the R1T Chimera is ranked as the third most-popular model, after V3-0324 and R1. The Chimera is currently used on 22 instances, i.e. deployed on 176 H200 GPUs. It processes 4.7B tokens per day. Any new version should either be clearly better than the Chimera, or marked as for a different purpose.

ChuckMcSneed

9 days ago

@TNGHK Can you merge new R1?

TNGHK

TNG Technology Consulting GmbH org 9 days ago

Hi there,

of course. Our colleague Benjamin already created a first R1-0528-Chimera this evening. I am currently testing it, and judging from the slow speed of the cluster, I guess some of the other TNGlers must be testing it, too :-).

Preliminary result: It appears to be quite well-behaved. Personally, I already consider that already a success, after all, it is still a bit of a miracle that the generated child LLMs are functional.

If it offers a performance benefit or interesting behaviours, we cannot say yet.

Cheers,
Henrik (and Robert, and Benjamin et al)

PS: It would be nice to have more GPUs.

djuna

8 days ago

Or, alternatively, maybe there are instructions how can I create an updated merge myself?

It's discussed here

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment