Detecting Multi Word Terms in patents the same way as entities
In English patent document information retrieval, Multi Word Terms (MWTs) are an important factor in determining how relevant a patent document is for a particular search query. Detecting the correct boundaries for these MWTs is no trivial task and often complicated by the special writing style of the patent domain.
In this paper we describe a method for detecting MWTs in patent sentences based on a method for detecting technical entities using deep learning. On our annotated dataset of 22 patents, our method achieved an average precision of 0.75, an average recall of 0.74 and an average F1 score of 0.74. Further, we argue for the use of domain specific word embedding resources and suggest that our model mostly learns whether individual words should be included in MWTs or not.
T. Fink, L. Andersson, A. Hanbury, Detecting Multi Word Terms in patents the same way as entities, World Patent Information 67 (2021) 102078