-bench - A Benchmark for Tool-Agent-User Interaction in Real-World Domains |
COIN |
Compressed Chain of Thought - Efficient Reasoning Through Dense Representations |
DeepSeek-R1 |
Deliberative Alignment - Reasoning Enables Safer Language Models |
G-Eval - NLG Evaluation using GPT-4 with Better Human Alignment |
How To 100M Learning Text Video |
How to Read a Paper |
How To Write a Paper |
How to Write Academic Paper (from CS Perspective) |
Investigating Continual Pretraining in Large Language Models - Insights and Implications |
Is a Question Decomposition Unit All We Need |
Large Language Models are Zero-Shot Rankers for Recommender Systems |
Molmo and PixMo |
MultiVENT |
OpenPI-C |
Paper Template |
Piecing It All Together - Verifying Multi-Hop Multimodal Claims |
PubMedQA - A Dataset for Biomedical Research Question Answering |
Scientific Fact-Checking - A Survey of Resources and Approaches |
Semantic Product Search for Matching Structured Product Catalogs in E-Commerce |
Token Assorted - Mixing Latent and Text Tokens for Improved Language Model Reasoning |
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction |
What is More Likely to Happen Next |
Zotero Template |