[1] LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding, ACL, 2024.
[2] MATTER: Memory-Augmented Transformer Using Heterogeneous Knowledge Sources, ACL-Findings, 2024.[3] EXAONE 3.0 7.8B Instruction Tuned Language Model, arxiv.
[4] LLaMA: Open and Efficient Foundation Language Models, arxiv.
[5] Language Models are Few-Shot Learners, NeurIPS, 2020.
[6] FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference, ACL-Findings, 2023.
[7] An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks, EMNLP, 2022.
[8] Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering, EACL, 2023.