참고
[1] Eo, Moonjung, Suhyun Kang, and Wonjong Rhee. "LESS: LEARNING TO SELECT A STRUCTURED ARCHITECTURE OVER FILTER PRUNING AND LOW-RANK DECOMPOSITION." 5th Workshop on practical ML for limited/low resource settings.

[2] Xia, Mengzhou, et al. "Sheared llama: Accelerating language model pre-training via structured pruning." arXiv preprint arXiv:2310.06694 (2023).

[3] Xia, Mengzhou, Zexuan Zhong, and Danqi Chen. "Structured pruning learns compact and accurate models." arXiv preprint arXiv:2204.00408 (2022).

[4] TogetherAI. Redpajama-incite-base-3b-v1, 2023a.

[5] Zimmer, Max, Christoph Spiegel, and Sebastian Pokutta. "Sparse model soups: A recipe for improved pruning via model averaging." arXiv preprint arXiv:2306.16788 (2023).

[6] Matena, Michael S., and Colin A. Raffel. "Merging models with fisher-weighted averaging." Advances in Neural Information Processing Systems 35 (2022): 17703-17716.

[7] Wortsman, Mitchell, et al. "Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time." International conference on machine learning. PMLR, 2022.