Label noise reduction in entity typing by heterogeneous partial-label embedding

Defined a new task called Label Noise Reduction in Entity Typing (LNR), which involves the identification of correct type labels for training examples!. We also proposed a general framework called Heterogeneous Partial Label Embedding (PLE) to jointly embed entity mentions, text features, and entity types into the same vector space such that entities that are semantically similar types are close together! The general approach of LNR task is as follows:

  1. Model true type labels in a candidate type set and use only the “best” type to be relevant to the mention

  2. Extract different text features from entity mentions and local contexts and use corpus-level co-occurrences between mentions and features to model mentions’ types

  3. Model type correlation (similarity) using mention-candidate type and mention-feature co-occurrences to help predict type-path. The PLE approach has all these elements where the first step is to construct a heterogeneous graph to represent entity mentions, text features, and entity types (as well as their relationships). The global objective is to jointly embed the graph into a low-dimensional space such that objects that are semantically close are close together!

Future Work

  1. Extends PLE’s similarity function to model hierarchical type dependency, using multi-sense embedding to model topic contexts and exploiting relation facts in KB jointly. Embeddings learned using PLE can be used to predict unseen mentions as well as used to denoise training data in other domains (image annotation)

Label embedding for zero-shot fine-grained named entity typing

Proposed a new label embedding method that includes both prototypical and hierarchical information. This new embedding method is designed for fine-grained named entity typing (FNET), which has two main challenges: growing set of entity types and low noisy quality dataset. The new label embedding method can be used to predict both seen and unseen entity types. We evaluated the model in two different settings: few-shot learning and zero-shot learning and it performed well in both!

Future Work

  1. Investigate whether performance would improve by including other side information and label noise reduction framework



Data Scientist

Leave a Reply