Compressed Chain of Thought - Efficient Reasoning Through Dense Representations

Summary

In this paper, authors try to reason through the continuous embedding space rather than discrete decoding tokens. The main motivation was to improve the decoding time and solve latency issue. They also hypothesized that the reasoning improvement can be tuned by the number of contemplation token generate. These tokens are trained through teacher forcing by using the original CoT token's hidden states.

The authors believe that these compressed decodings can then be again decoded to get the reasoning path
Generates compressed contentful and continuous embeddings that are much shorter than the original CoT while keeping the accuracy similar or close to similar
They have trained two models — (1) which generates CCOT toekns, (2) which decodes given the CCOT tokens
Trained the CCOT generator by using teacher forcing with respect to the original tokens gold hidden states
Trained the decoder independently, only using the first CCOT token to initiate the COT from the previous module
During inference, it generates CCOT tokens from the CCOT module and when finished, it uses the trained decoder module to generate the answer.

problem is the results are not promising

Terms

Contentful tokens are those which either themselves are semantically contentful or the hidden states are derived from semantical tokens

Annotations

Annotation

« contemplation tokens »()

Annotation

« Here we propose Compressed Chain-of-Thought (CCoT), a framework to generate contentful and continuous contemplation tokens of variable sequence length »()

Annotation

« the reasoning improvements can be adaptively modified on demand by controlling the number of contemplation tokens generated »()

Annotation

« Our framework, called Compressed Chain of Thought (CCoT), generates contemplation tokens which are compressed representations of language-based reasoning chains. »()

Annotation

« These contemplation tokens are trained through teacher forcing with respect to the gold hidden states corresponding to full reasoning traces. »()

Annotation

« Our method differs in that the contemplation tokens we generate are grounded in text rather than only used as a signal to decode from. »(2)

Annotation

« our grounding offers the future potential for decoding the reasoning chain from the compressed representations, allowing for post-hoc human inspection of the LLM’s reasoning. »(2)

Metadata

Date : 12-17-2024

Authors : Jeffrey Cheng, Benjamin Van Durme

Paper Link : http://arxiv.org/abs/2412.13171

Zotero Link: Full Text PDF

Tags : #Computer-Science---Computation-and-Language

Citation :