Any plans to release an updated version based on DeepSeek-V3-0526 + R1, or how to create the merge myself?
I wonder if there are any plans to release a new version that would be based on DeepSeek-V3-0526 and R1?
Or, alternatively, maybe there are instructions how can I create an updated merge myself?
I assume DeepSeek-V3-0526 still has the same architecture, so if there are instructions how to create the merge, it would help a lot to save bandwidth assuming if I already have full versions of V3-0526 and R1, to merge and quantize myself (technically, I just found out about new version at https://www.reddit.com/r/LocalLLaMA/comments/1kvpwq3/deepseek_v3_0526/ and did not yet begun downloading V3-0526, but I already have R1). So I am trying to figure out if it is better to wait for updated R1T Chimera or download V3-0526 and try to make it myself, if that is possible.
Hi, no worries. If DeepSeek releases something new, and if the technical parameters of the new release are within range, of course we will create a gaggle of variations :-). Significant testing will have to be made, because even if these are research prototypes, it's better to know then to know not.
For example, on chutes.ai alone, the R1T Chimera is ranked as the third most-popular model, after V3-0324 and R1. The Chimera is currently used on 22 instances, i.e. deployed on 176 H200 GPUs. It processes 4.7B tokens per day. Any new version should either be clearly better than the Chimera, or marked as for a different purpose.
Hi there,
of course. Our colleague Benjamin already created a first R1-0528-Chimera this evening. I am currently testing it, and judging from the slow speed of the cluster, I guess some of the other TNGlers must be testing it, too :-).
Preliminary result: It appears to be quite well-behaved. Personally, I already consider that already a success, after all, it is still a bit of a miracle that the generated child LLMs are functional.
If it offers a performance benefit or interesting behaviours, we cannot say yet.
Cheers,
Henrik (and Robert, and Benjamin et al)
PS: It would be nice to have more GPUs.