This talk will be presented by CSH visitor Gábor Recski from TU Wien Informatics and will take place at the CSH on Wednesday, June 22 at 3 pm.
Abstract:
Information extraction (IE) is the task of detecting structured information in natural language text. Common IE tasks such as Named Entity Recognition (NER) and Relation Extraction (RE) enable NLP applications that map textual data to formal representations of their content. Common applications include customer service chatbots, large-scale media monitoring, and automated legal research. Such IE solutions each solve a narrow fragment of natural language understanding (NLU), such as recognizing mentions of entities and relations between entities (a typical application for technical documents), or classifying them based on intent or sentiment (typical for e.g. social media).
Most IE solutions rely on end-to-end machine learning models that essentially perform pattern recognition on raw text. In absence of any explicit task model, such solutions lack in transparency, reliability, and customizability, while posing a high risk of unintended bias.
We shall introduce an approach to building symbolic IE solutions using human-in-the-loop learning and explicit models of the syntax and semantics of natural language. Using simple examples we will show how domain experts can easily bootstrap robust solutions for their custom text mining tasks.