Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models – A Survey

Published in COLM 2024, 2024

Recommended citation: Philipp Mondorf and Barbara Plank. 2024. Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey. In Proceedings of the First Conference on Language Modeling. URL: https://openreview.net/forum?id=Lmjgl2n11u.
Download Paper