ethangoh7086cmd's picture
Update README.md
290d074 verified
---
license: mit
datasets:
- open-thoughts/OpenThoughts-114k
language:
- en
base_model:
- Qwen/Qwen3-1.7B
- Qwen/Qwen3-4B
- Qwen/Qwen2.5-1.5B-Instruct
tags:
- vae
---
# VAE Layer for the Research *Gated Latent Reasoning Loop* (tentative name)
> Please refer to our code: [https://github.com/elliot-zzh/from-transparent-to-opaque](https://github.com/elliot-zzh/from-transparent-to-opaque).
> The project is under construction, and we will publish the paper once we are ready.
This is the pretrained VAE layer for the research *Gated Latent Reasoning Loop* (tentative name).
There are 3 VAEs, and they are applied to different models:
- [`vae_epoch10.pth`](https://huggingface.co/ethangoh7086cmd/gated-latent-reasoning-loop-vae/blob/main/vae_epoch10.pth): The VAE for [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B). The input size is 4096 (although this is not confirmed).
- [`vae_epoch15.pth`](https://huggingface.co/ethangoh7086cmd/gated-latent-reasoning-loop-vae/blob/main/vae_epoch15.pth): The VAE for [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B). The input size is 2048.
- [`vae_epoch14.pth`](https://huggingface.co/ethangoh7086cmd/gated-latent-reasoning-loop-vae/blob/main/vae_epoch14.pth): The VAE for [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). The input size is 1536.
The structure of the VAE involves two linear layers, including the compressor and the uncompressor.