Fine-tune Gemma3n on videos with audios inside with Colab A100 ๐ฅ Just dropped the notebook where you can learn how to fine-tune Gemma3n on images+audio+text at the same time!
keep in mind, it's made for educational purposes ๐ซก we do LoRA, audio resampling & video downsampling to be able to train <40GB VRAM stretch modalities and unfreeze layers as you wish! ๐๐ป merve/smol-vision
This release refines kernel selection in the kernelize function:
โข You can now register kernels for certain CUDA capability ranges. โข Rather than doing exact mating of modes, fall back to other compatible modes. If you are kernelizing for inference, but you only registered a training + torch.compile kernel, it will use that kernel since it is compatible with inference as well.
We've moved over 20PB from Git LFS to Xet on the Hub without downtime or data loss. Having things "just work" on a migration of this scale is about as good as it gets.
In the early days of joining Hugging Face, we made a few key design decisions: * There would be no "hard cut-over" from Git LFS to Xet * A Xet-enabled repository should be able to contain both Xet and LFS files * Repository migrations from LFS to Xet can run in the background without disrupting downloads or uploads
These were largely driven by our desire to ensure the community could keep working without interruption.
We cover the infrastructure making this all go in this post, specifically: * An integral piece of infrastructure known internally as the Git LFS Bridge * Background content migrations that run around the clock
This release makes it possible to register multiple kernels for a layer. Do you have a super-fast kernel for inference and another kernel for training? Register them both and kernelize will pick the kernel depending on whether you are going to do training or inference.
They have an image tokenizer unified with text, and they de-tokenize using either of two models (LLM and diffusion) The model is actually a full LLM (Qwen2), the tokenizer converts image tokens ๐คฏ