Data Preparation

To prepare the data for the analysis, we first clean the skills removing any punctuation, special characters and numbers. We also exclude several groups of skills, which are: frequently inaccurate skills, phrases related to industry experience (that aren’t related to skills) and skills mentioned three or fewer times over the six-year period. The 14 frequently inaccurate skills were identified in previous analysis of Burning Glass adverts and include acronyms which can refer to different sets of words when expanded (e.g. CPR: Cardiopulmonary resuscitation or Civil procedures rules) and irrelevant phrases scraped accidentally from web-pages (e.g. “image processing”“facebook”). In total 891 skills are excluded. These account for 7.8% of all skills and comprise 3.1% of all skill mentions. 

Building the Skills Graph

We use a graph approach to model the relationships between skills. Representing skills as a graph allows us to go beyond studying pairwise connections between skills and identify groups of skills that are densely connected to each other. To build the skills graph we traverse all job adverts and count the co-occurrences of skills. We then represent the counts as an (N, N) adjacency matrix, where N is the number of unique skills and the elements in the matrix indicate the number of skill co-occurrences. We then use the adjacency matrix to generate an undirected graph, G = (V, E), where vertices, V, are the skills, and edges, E, are the co-occurrences of the skills in the same advert. 
To capture the strength of relationships in the network, we use two parameters for network edges: f, frequency and c, cosine similarity. Frequency refers to the most intuitive measure of skill relationships, and for two skills connected by an edge, it represents the total number of unique adverts that mention both skills. Put simply, the higher the frequency of the edge, the higher the co-occurrence of the skills and the stronger the relationship between them. However, this measure is flawed, since it will amplify the strength of the relationships between frequently occurring skills, such as sales management or mechanical engineering. This will make it more difficult to detect substantive relationships between skills that are mentioned less frequently. The decision on which metric to use for measuring the strength of edges has important implications for the community detection stage, since many algorithms use the edge-strength property as weights when partitioning the graph. In the case of skills, we are also likely to have situations where a skill (e.g. computer-aided design, predictive modelling, etc.) is mentioned in more than one domain and using frequency would result in assigning the skill to the largest domain.
As an alternative to a frequency-based measure of strength between skills, we trial context based vector representations. These distributed representations, also known as word embeddings, refer to a Natural Language Processing technique used to capture semantic similarities of terms based on their distribution in large text corpora \citep*{jurafsky_question_2008}. Word embeddings convert terms, or in our case skills, into vectors that reflect the context in which the skills occur. The context refers to the probability that given terms or skills will be found together in a sentence or job advert. There is evidence in support of the higher accuracy of distributed representations of terms over frequency based approaches \citep*{Zhao2015}. One of the leading methods for generating word embeddings is word2vec, which is based on shallow neural network language models \citep*{mikolov_distributed_2013}. To compute distributed representations of skills in the online job adverts we use the continuous bag of words word2vec model as implemented in the python gensim library \citep*{rehurek_lrec}. We train the word2vec model on 41 million online job adverts, ignoring adverts that mention 20 or more requirements. This filter is applied because the exploratory analysis showed that adverts with over 20 keywords typically combine descriptions of more than one vacancy. As demonstrated in Table 1, the trained word2vec model is able to uncover meaningful patterns in job adverts.