참고

[1] OpenAI. “ChatGPT (September 25 Version).” 2023. https://chat.openai.com/chat

[2] B. Goertzel, “Artificial general intelligence: concept, state of the art, and future prospects.” Journal of Artificial General Intelligence, 2014.

[3] T. Brown et al., “Language models are few-shot learners.” Advances in neural information processing systems, 2020.

[4] H. Touvron et al., “Llama 2: Open foundation and fine-tuned chat models.” arXiv preprint arxiv:2307.09288, 2023.

[5] “Common crawl.” Available: https://commoncrawl.org/

[6] C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer.” The Journal of Machine Learning Research, 2020.

[7] Together Computer. “Redpajama: An open source recipe to reproduce llama training dataset.” 2023.

[8] G. Penedo et al., “The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only.” arXiv preprint arxiv:2306.01116, 2023.

[9] “Wikipedia.” Available: https://en.wikipedia.org/wiki/

[10] D. Kocetkov et al., “The stack: 3TB of permissively licensed source code.” arXiv preprint arxiv:2211.15533, 2022.

[11] L. Gao et a., “The pile: An 800GB dataset of diverse text for language modeling.” arXiv preprint arXiv:2101.00027, 2020.

[12] Y. Zhu et al., “Aligning books and movies: Towards story-like visual explanations by watching movies and reading books.” Proc. of ICCV, 2015.

[13] “Arxiv.” Available: https://arxiv.org/

[14] “PubMed Central.” Available: https://www.ncbi.nlm.nih.gov/pmc/about/intro/

[15] “United states patent and trademark office.” Available: https://www.uspto.gov/

[16] W. Zhao, “A survey of large language models.” arXiv preprint arXiv:2303.18223, 2023.

[17] D. Paperno et al., “The LAMBADA dataset: Word prediction requiring a broad discourse context.” arXiv preprint arXiv:1606.06031, 2016.

[18] S. Narayan et al., “Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization.” arXiv preprint arXiv:1808.08745, 2018.

[19] T. Kwiatkowski et al., “Natural questions: a benchmark for question answering research.” Transactions of the Association for Computational Linguistics, 2019.

[20] P. Clark et a., “Think you have solved question answering? Try ARC, the AI2 reasoning challenge.” arXiv preprint arXiv:1803.05457, 2018.

[21] S. Lin et al., “TruthfulQA: Measuring how models mimic human falsehoods.” arXiv preprint arXiv:2109.07958, 2021.

[22] Y. Bisk et al., “PiQA: Reasoning about physical commonsense in natural language.” Proc. of AAAI, 2020.

[23] R. Zellers et al., “HellaSwag: Can a machine really finish your sentence?” arXiv preprint arXiv:1905.07830, 2019.

[24] K. Sakaguchi et al., “WinoGrande: An adversarial Winograd schema challenge at scale.” Communications of the ACM, 2021.

[25] K. Cobbe et al., “Training verifiers to solve math word problems.” arXiv preprint arXiv:2110.14168, 2021.

[26] D. Hendrycks et al., “Measuring massive multitask language understanding.” arXiv preprint arXiv:2009.03300, 2020.

[27] A. Srivastava et al., “Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.” arXiv preprint arXiv:2206.04615, 2022.

[28] P. Liang et al., “Holistic evaluation of language models.” arXiv preprint arXiv:2211.09110, 2022.

[29] G. Wenzek et al., “CCNet: Extracting high quality monolingual datasets from web crawl data.” arXiv preprint arXiv:1911.00359, 2019.

[30] K. Lee et al., “Deduplicating training data makes language models better.” arXiv preprint arXiv:2107.06499, 2021.

[31] A. Vaswani et al., “Attention is all you need.” Advances in Neural Information Processing Systems, 2017.

[32] A. Radford et al., “Improving language understanding by generative pre-training.” 2018.

[33] J. Wei et al., “Finetuned language models are zero-shot learners.” Proc. of ICLR, 2022.

[34] P. Christiano et al., “Deep reinforcement learning from human preferences.” Advances in Neural Information Processing Systems, 2017.

[35] OpenAI, “GPT-4 technical report.” arXiv preprint arXiv:2303.08774, 2023.

[36] Google, “PaLM 2 technical report.” arXiv preprint arXiv:2305.10403, 2023.

[37] J. Devlin et al., “BERT: Pre-training of deep bidirectional transformers for language understanding.” Proc. of NAACL-HLT, 2019.

[38] C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer.” Journal of Machine Learning Research, 2020.

[39] J. Ao et al., “SpeechT5: Unified-modal encoder-decoder pre-training for spoken language processing.” Proc of ACL, 2022.

[40] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale.” Proc. of ICLR, 2021.

[41] L. Xue et al., “mT5: A massively multilingual pre-trained text-to-text transformer.” Proc. of NAACL-HLT, 2021.

[42] “KorQuAD 1.0.” Available: https://korquad.github.io/KorQuad%201.0/

[43] A. Radford et al., “Language models are unsupervised multitask learners.”, 2019.

[44] V. Sanh et al., “Multitask prompted training enables zero-shot task generalization.” Proc. of ICLR, 2022.

[45] Y. Wang et al., “Super-natural instructions: Generalization via declarative instructions on 1600+ NLP tasks.” Proc. of EMNLP, 2022.

[46] N. Muennighoff et al., “Crosslingual generalization through multitask finetuning.” Proc. of ACL, 2023.

[47] Z. Ji et al., “Survey of hallucination in natural language generation.” ACM Computing Survey, 2023.

[48] L. Wang et al., “A survey on large language model based autonomous agents.” arXiv preprint arXiv:2308.11432, 2023.

[49] Y. Bai et al., “Training a helpful and harmless assistant with reinforcement learning from human feedback.” arXiv preprint arXiv:2204.05862, 2022.

[50] N. Stiennon et al., “Learning to summarize with human feedback.” Advances in Neural Information Processing Systems, 2020.

[51] L. Ouyang et a., “Training language models to follow instructions with human feedback.” Advances in Neural Information Processing System, 2022.

[52] R. Rafailov et al., “Direct preference optimization: Your language model is secretly a reward model.” arXiv preprint arXiv:2305.18290, 2023.

[53] H. Liu et al., “Languages are rewards: Handsight finetuning using human feedback.” arXiv preprint arXiv:2302.02676, 2023.

[54] R. Ramamurthy et al., “Is reinforcement learning (not) for natural language processing? Benchmarks, baselines, and building blocks for natural language policy optimization.” arXiv preprint arXiv:2210.01241, 2022.

[55] C. Gulcehre et al., “Reinforced self-training (rest) for language modeling.” arXiv preprint arXiv:2308.08998, 2023.

[56] R. Bradley et al., “Rank analysis of incomplete block designs: I. The method of paired comparisons.” Biometrika, 1952.

[57] L. Gao et al., “Scaling laws for reward model overoptimization.” Proc. ICLR, 2023.

[58] J. Schulman et al., “Trust region policy optimization.” Proc. of ICML, 2015.

[59] J. Schulman et al., “Proximal policy optimization algorithms.” arXiv preprint arXiv:1707.06347, 2017.

[60] E. Beeching et al., “Open LLM Leaderboard.” HuggingFace, 2023.

[61] L. Zheng et al., “Judging LLM-as-a-judge with MT-bench and Chatbot Arena.” arXiv preprint arXiv:2306.05685, 2023.

[62] P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks.” Proc of NeurIPS, 2020.

[63] “Retrieval Augmented Generation (RAG).” Available: https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html

[64] Y. Lee et al., “QASA: Advanced question answering on scientific articles.” Proc of ICML, 2023.