[1] LG AI Research. "EXAONE 3.0 7.8 B Instruction Tuned Language Model." arXiv preprint arXiv:2408.03541 (2024).
[2] LG AI Research. "EXAONE 3.5: Series of Large Language Models for Real-world Use Cases." arXiv preprint arXiv:2412.04862 (2024).
[3] LG AI Research. "EXAONE Deep: Reasoning Enhanced Language Models." arXiv preprint arXiv:2503.12524 (2025).
[4] Fedus, William, Barret Zoph, and Noam Shazeer. "Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity." Journal of Machine Learning Research 23.120 (2022): 1-39.
[5] Jiang, Albert Q., et al. "Mixtral of experts." arXiv preprint arXiv:2401.04088 (2024).
[6] Gale, Trevor, et al. "Megablocks: Efficient sparse training with mixture-of-experts." Proceedings of Machine Learning and Systems 5 (2023): 288-304.
[7] Dai, Damai, et al. "Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models." arXiv preprint arXiv:2401.06066 (2024).
[8] Liu, Aixin, et al. "Deepseek-v3 technical report." arXiv preprint arXiv:2412.19437 (2024).
[9] Kaplan, Jared, et al. "Scaling laws for neural language models." arXiv preprint arXiv:2001.08361 (2020).
[10] Hoffmann, Jordan, et al. "Training compute-optimal large language models." arXiv preprint arXiv:2203.15556 (2022).
[11] Rafailov, Rafael, et al. "Direct preference optimization: Your language model is secretly a reward model." Advances in Neural Information Processing Systems 36 (2023): 53728-53741.
[12] Meng, Yu, Mengzhou Xia, and Danqi Chen. "Simpo: Simple preference optimization with a reference-free reward." Advances in Neural Information Processing Systems 37 (2024): 124198-124235.
[13] Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
[14] Shao, Zhihong, et al. "Deepseekmath: Pushing the limits of mathematical reasoning in open language models." arXiv preprint arXiv:2402.03300 (2024).
[15] Lambert, Nathan, et al. "Tulu 3: Pushing frontiers in open language model post-training." arXiv preprint arXiv:2411.15124 (2024).
[16] OpenAI. "Learning to Reason with LLMs." OpenAI, March 28, 2024. https://openai.com/index/learning-to-reason-with-llms/.
[17] DeepSeek AI. "Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning." arXiv preprint arXiv:2501.12948 (2025).
[18] Anthropic. “Building effective agents” Dec 19, 2024. https://www.anthropic.com/engineering/building-effective-agents.
[19] Anthropic. “Introducing the Model Context Protocol” Nov 26, 2024. https://www.anthropic.com/news/model-context-protocol.
[20] OpenAI. "Computer-Using Agent" OpenAI, January 23, 2025. https://openai.com/index/computer-using-agent/.
[21] Anthropic. “Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku” Oct 23, 2024. https://www.anthropic.com/news/3-5-models-and-computer-use
[22] Silver, David, and Richard S. Sutton. "Welcome to the Era of Experience." Google AI (2025).
[23] Xiao, Teng, et al. "SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters." arXiv preprint arXiv:2502.00883 (2025).
[24] Anthropic. “MCP Introduction”, 2024. https://modelcontextprotocol.io/introduction.