Ruslan Vasilev's picture
1 1

Ruslan Vasilev PRO

artnitolog

AI & ML interests

None yet

Recent Activity

Organizations

None yet

Posts 3

view post
Post
204
awesome-arXiv 🚀: https://github.com/artnitolog/awesome-arxiv

I've just released awesome-arXiv, a curated collection of tools, libraries, datasets, and resources for discovering, reading, and automating your work with arXiv papers.

Feedback and contributions are welcomed!
view post
Post
2588
Recently, we open-sourced YaFSDP, Yandex’s tool for efficient distributed training of LLMs.

Here are some of the key ideas used in YaFSDP to provide speedup and memory savings over FSDP:
• Allocate and utilize just two buffers throughout the transformer for all collected weights to circumvent the torch memory allocator;
• Gather small normalization layers at the beginning of the iteration and average the gradients only at the end;
• Move gradient division to the very end of the backward pass.

To learn more about how YaFSDP works, check out our latest blog post: https://medium.com/yandex/yafsdp-a-tool-for-faster-llm-training-and-optimized-gpu-utilization-is-no-632b7539f5b3

models 0

None public yet