Publications

Storyflow

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models

Philipp Mondorf*, Sondre Wold* and Barbara Plank. 2024. Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models. ArXiv:2410.01434 [cs.LG].

Preprint 2024

Read Paper

Storyflow

Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models

Philipp Mondorf and Barbara Plank. 2024. Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models. In The 2024 Conference on Empirical Methods in Natural Language Processing, Miami, United States of America.

EMNLP 2024 Main

Read Paper | Code | Data

Storyflow

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni. (2024). LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks. ArXiv:2406.18403 [cs.CL].

Preprint 2024

Read Paper | Code & Data

Storyflow

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey

Philipp Mondorf and Barbara Plank. 2024. Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey. In First Conference on Language Modeling, Philadelphia, United States of America.

COLM 2024

Read Paper

Storyflow

Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning

Philipp Mondorf and Barbara Plank. 2024. Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9370–9402, Bangkok, Thailand. Association for Computational Linguistics.

ACL 2024 Main

Read Paper | Code | Data