Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey
Philipp Mondorf, & Barbara Plank. (2024). Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey. ArXiv:2404.01869 [cs.CL].
Read Paper
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning
Philipp Mondorf, & Barbara Plank. (2024). Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning. ArXiv:2402.14856 [cs.CL].
Read Paper