Methods

To generate the employer requirements taxonomy we start with capturing the relationships between the requirements as they are mentioned in Burning Glass job adverts. We use two approaches to measure interrelationships between employer requirements. The first approach is based on pairwise frequency with which keywords are mentioned together in a job advert. The second approach is based on distributed representation of keywords. We generate these by training a word2vec model, which  learns the extent to which keywords occur in the same context (i.e. together with similar words). As a next step, we model the employer requirement as a graph, where vertices represent individual requirements. The vertices are joined by edges if they are mentioned in the same advert. The edges between vertices have attributes that describe the strength of the relationship, such as the weight (total number of pairwise keyword mentions) and the cosine similarity (similarity of context in which the two requirements occur across all adverts). Once employer requirements are represented as a network, we apply modularity optimisation community detection algorithm to identify nested clusters of requirements. But to improve the quality of clusters we first identify and address the presence of highly transversal skills, competences and knowledge among the employer requirements.