Introduction
This paper will mainly be structured as such: In the introduction we may present issues that researchers face when trying to address massively multi-label and/or multi-class problems. The problems are mainly:
- Class identity or labels on large datasets often come from poorly qualified sources, and most importantly the number and background of annotators is probably high which will inevitably lead to annotation noise in the form of mistakes, incompletions and more worryingly an absence of consensus on the semantics. A major example of this is the Google Audioset, where thousands of labels of many different nature are combined to be used seemingly together by learning algorithm.
- The relationship between a tag and an item is rarely defined itself (except if the tagging arise from a well defined ontology).