NAWI: News Articles and Knowledge
(2021 – 2026)
The aim of NAWI is to dynamically extract news narratives from various news articles in real time. For this purpose, Named Entity Recognition and Relation Extraction methods are used to create so-called RDF triples, with the help of which a temporal knowledge graph is to be created. The graph is then to be made individually queryable and usable for the end user in a web UI interface.
Tasks and objectives
Every day, vast amounts of news articles are made available on the World Wide Web, which contain immense information potential. However, these articles are so-called unstructured data, which must first be processed using automated procedures in order to extract meaningful information.
As part of the project, named entities such as personal names, company names, place names, etc. are first to be recognized and extracted from the messages using Named Entity Recognition (NER). In addition, SpotLight Detection must be used to recognize and extract so-called concepts in the messages. The DBpedia Knowledge Graph is used to recognize the concepts. The concepts and entities are then to be transferred into a syntax tree for the respective record of the message article.
After successful extraction, relations between entities or concepts should then be found and merged into statement triples of the form subject - predicate - object. At the same time, it should always be possible to trace which news article each triple originates from.
Finally, it will be investigated how the extracted triples can be linked to a knowledge graph and consequently how the knowledge graph can be expanded accordingly. An analysis of temporal changes in the knowledge graph should then be carried out to increase transparency.
Suitable visual analytics methods must be tested with regard to the graphical interface.