Data

The online job advert dataset used in this paper was provided by Burning Glass Technologies, a labour market analytics company. Burning Glass collects data on active job postings from thousands of web-pages on a daily basis \cite{burning_glass_technologies_markets_2017}. For each job posting, in addition to extracting job title, salary, education and experience requirements, Burning Glass identifies keywords from free text job descriptions. The full job descriptions are not available. We refer to the keywords as skills, which include skills, personal competences and knowledge required by employers. To develop the initial skills taxonomy we use data on over 32 million adverts collected over a five-year period from January 2012 to December 2016. The data for 2017 will be used for comparison and out of sample validation of the taxonomy. 
It is important to note that in our dataset there are many adverts with missing information: only 61% of adverts contain data on offered salary, and substantially fewer mention education (19% of adverts) and experience requirements (13% of adverts).

Methods

We use two approaches to measure the relationships between skills mentioned in Burning Glass job adverts (Figure \ref{992146}). The first approach is based on pairwise frequency of two skills appearing in the same job advert. The second approach is based on distributed representation of skills. We generate the vector representations of skills by training a word2vec model, which learns the extent to which skills occur in the same context (i.e. together with other skills). 
As a next step, we model the skills as a graph, where vertices represent individual skills. The vertices are joined by edges if they are mentioned in the same advert. The edges between vertices have attributes that describe the strength of the relationship, such as the frequency (total number of pairwise skill mentions) and the cosine similarity (similarity of context in which the two skills occur across all adverts).