[2] Xia, Mengzhou, et al. "Sheared llama: Accelerating language model pre-training via structured pruning." arXiv preprint arXiv:2310.06694 (2023).
[3] Xia, Mengzhou, Zexuan Zhong, and Danqi Chen. "Structured pruning learns compact and accurate models." arXiv preprint arXiv:2204.00408 (2022).
[4] TogetherAI. Redpajama-incite-base-3b-v1, 2023a.
[5] Zimmer, Max, Christoph Spiegel, and Sebastian Pokutta. "Sparse model soups: A recipe for improved pruning via model averaging." arXiv preprint arXiv:2306.16788 (2023).
[6] Matena, Michael S., and Colin A. Raffel. "Merging models with fisher-weighted averaging." Advances in Neural Information Processing Systems 35 (2022): 17703-17716.
[7] Wortsman, Mitchell, et al. "Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time." International conference on machine learning. PMLR, 2022.