metadata

license: mit
datasets:
  - open-thoughts/OpenThoughts-114k
language:
  - en
base_model:
  - Qwen/Qwen3-1.7B
  - Qwen/Qwen3-4B
  - Qwen/Qwen2.5-1.5B-Instruct
tags:
  - vae

VAE Layer for the Research Gated Latent Reasoning Loop (tentative name)

Please refer to our code: https://github.com/elliot-zzh/from-transparent-to-opaque. The project is under construction, and we will publish the paper once we are ready.

This is the pretrained VAE layer for the research Gated Latent Reasoning Loop (tentative name).

There are 3 VAEs, and they are applied to different models:

vae_epoch10.pth: The VAE for Qwen3-4B. The input size is 4096 (although this is not confirmed).
vae_epoch15.pth: The VAE for Qwen3-1.7B. The input size is 2048.
vae_epoch14.pth: The VAE for Qwen2.5-1.5B-Instruct. The input size is 1536.

The structure of the VAE involves two linear layers, including the compressor and the uncompressor.