A Cluster-based Sample Selection Strategy for Biological Event Extraction
Abstract— Biological Event Extraction is an important and difficult task, whose purpose is to obtain biomedical knowledge from biomedical literature by identifying biomedical entities and extracting their complex relations from the texts. However, biological events are highly complex and the annotated biomedical corpus is highly imbalanced, which affects the performance of classifier. In this paper, we develop a new semi-supervised biomedical event extraction method based on pairwise model. Firstly, we use initial trained classifier to predict trigger-argument pairs of unlabeled data in sentences. Secondly, we have presented a sentence representation method that combines dependent path by word embedding from biomedical text. Then we fed it to cluster to filter noise samples by a sample selection strategy, and the rest of samples are added to the training dataset to balance dataset. Experimental results on BioNLP-ST GENIA corpus show that the proposed method demonstrates its effectiveness and achieves better performance.
Index Terms— Biological Event Extraction, Sentence Representation, Cluster
Yang Lu, Xiaolei Ma, Yinan Lu
College of the Computer Science and Technology, Jilin University Changchun, CHINA
Yang Lu, Xiaolei Ma
Library, Inner Mongolia University for Nationalities, CHINA
Cite: Yang Lu, Xiaolei Ma, Yinan Lu, "A Cluster-based Sample Selection Strategy for Biological Event Extraction," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, pp. 72-77, Hong Kong, 15-17 June, 2019.