참고
[1] Holtzman, A., West, P., Shwartz, V., Choi, Y., & Zettlemoyer, L. Surface Form Competition: Why the Highest Probability Answer Isn't Always Right. EMNLP 2021.

[2] Zhao, Z., Wallace, E., Feng, S., Klein, D., & Singh, S. Calibrate before use: Improving few-shot performance of language models. ICML 2021.

[3] Min, S., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. Noisy channel language model prompting for few-shot text classification. arXiv 2021.

[4] Shridhar, M., Thomason, J., Gordon, D., Bisk, Y., Han, W., Mottaghi, R., Zettlemoyer, L. & Fox, D. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. CVPR 2020.

[5] Singh, K. P., Bhambri, S., Kim, B., Mottaghi, R., & Choi, J. Factorizing perception and policy for interactive instruction following. CVPR 2021.

[6] Min, S. Y., Chaplot, D. S., Ravikumar, P., Bisk, Y., & Salakhutdinov, R. Film: Following instructions in language with modular methods. ICLR 2022.

[7] Blukis, V., Paxton, C., Fox, D., Garg, A., & Artzi, Y. A persistent spatial semantic representation for high-level natural language instruction execution. CoRL 2022.