🧠 of Dipta

Annotation

« In this work, we propose a hybrid representation of the reasoning process, where we partially abstract away the initial reasoning steps using latent discrete tokens generated by VQ-VAE, significantly reducing the length of reasoning traces. »()

Annotation

« For each benchmark, we train a VQVAE for 100k steps using the Adam optimizer, with learning rate 10−5 and batch size 32. We use a codebook of size 1024 and compress every chunk of L = 16 tokens into a single latent token (i.e., the compression rate r = 16). »(5)

Summary

3+ Most Important Things

1+ Deficiencies

3+ New Ideas

Annotations

Related Notes