« we employ a randomized replacement strategy: during training, we randomly vary the number of text tokens being substituted by latent tokens for each sample. »(2)
« our VQ-VAE is trained on the whole input sequence X, but only applied to C in the next stage »(3)
« When applying the VQ-VAE to compress the text tokens, the discrete latent tokens Z are essentially the index of corresponding embeddings in the codebook. »(3)
« one remarkable challenge is to deal with the extended vocabulary »(4)
« In the context of our approach, this means we increase the values of m in each stage until it reaches a pre-set cap value. »(4)
« where dedicated optimization tuning is needed »(4)
« where the value of m is randomly set for each sample »(4)
« For each benchmark, we train a VQVAE for 100k steps using the Adam optimizer, with learning rate 10−5 and batch size 32. We use a codebook of size 1024 and compress every chunk of L = 16 tokens into a single latent token (i.e., the compression rate r = 16). »(5)
Date : 02-05-2025
Authors : DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng
Paper Link : http://arxiv.org/abs/2502.03275
Zotero Link: PDF
Tags : #Computer-Science---Artificial-Intelligence, #Computer-Science---Computation-and-Language, #Computer-Science---Machine-Learning, #Computer-Science---Logic-in-Computer-Science
Citation :