
Paper accepted at ACL 2025
21 May 2025
Named Entity Recognition (NER), i.e. the recognition of named entities such as people, places, etc., is a frequently used machine learning method. It is used in various applications of Natural Language Processing (NLP). The work of Florian Babl, Moritz Hennen, Jakob Murauer and Michaela Geierhos aims to draw attention to the widespread contamination of test data sets in this area. Contamination in this context means that certain person names are present in both the training and test data. Furthermore, they show the impact of this contamination on the generalization ability of three different state-of-the-art models, which deteriorates by 2-10%. Finally, they present a new approach for creating NER datasets that is the first of its kind to solve the aforementioned problems.
More about this paper: https://2025.aclweb.org/program/
Image source: AdobeStock/photoopus