|
--- |
|
license: mit |
|
datasets: |
|
- open-thoughts/OpenThoughts-114k |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen3-1.7B |
|
- Qwen/Qwen3-4B |
|
- Qwen/Qwen2.5-1.5B-Instruct |
|
tags: |
|
- vae |
|
--- |
|
|
|
# VAE Layer for the Research *Gated Latent Reasoning Loop* (tentative name) |
|
|
|
> Please refer to our code: [https://github.com/elliot-zzh/from-transparent-to-opaque](https://github.com/elliot-zzh/from-transparent-to-opaque). |
|
> The project is under construction, and we will publish the paper once we are ready. |
|
|
|
This is the pretrained VAE layer for the research *Gated Latent Reasoning Loop* (tentative name). |
|
|
|
There are 3 VAEs, and they are applied to different models: |
|
|
|
- [`vae_epoch10.pth`](https://huggingface.co/ethangoh7086cmd/gated-latent-reasoning-loop-vae/blob/main/vae_epoch10.pth): The VAE for [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B). The input size is 4096 (although this is not confirmed). |
|
- [`vae_epoch15.pth`](https://huggingface.co/ethangoh7086cmd/gated-latent-reasoning-loop-vae/blob/main/vae_epoch15.pth): The VAE for [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B). The input size is 2048. |
|
- [`vae_epoch14.pth`](https://huggingface.co/ethangoh7086cmd/gated-latent-reasoning-loop-vae/blob/main/vae_epoch14.pth): The VAE for [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). The input size is 1536. |
|
|
|
The structure of the VAE involves two linear layers, including the compressor and the uncompressor. |