Discussion
While the employer requirements taxonomy is generated entirely automatically without expert input, it appears to perform reasonably well in identifying distinct groups of skills, competences and knowledge. As shown in Table and plots in Appendix the cluster profiles,especially at first and second layers of the skills requirements hierarchy, reflect established occupational domains, such as education, health, information technology and business administration. The metadata on skills requirements clusters, such as salary and job titles, also appear generally aligned with the data from official statistics. For example, the skills requirements with the highest minimum salary are located in finance, tax and compliance and software engineering, while the lowest paid employer requirements are in caregiving and retail (all of these clusters reside in the third layer of hierarchy).
Comparison to 2017 shows stability.
Initial results of the employer requirements demonstrate that the data-driven approach for grouping skills, competences and knowledge needed for performing jobs has its merits. At the same time, in its current state the methodology for deriving the hierarchical taxonomy has several limitations. The first limitation is the declining informativeness of clusters at the tips of the tree. At the moment the progressive layers of the hierarchy contain subclusters that were theoretically viable. However, the fact that splitting a cluster improves modularity score doesn’t mean that the resulting lower level clusters are well separated. It is likely that with increasing depth, the clusters will be fragmented and driven by stochastic artefacts rather than meaningful differences. The challenge is to identify an objective criterion for determining stability and confidence in the cluster partitioning. One potential solution is to apply the approach commonly used in phylogenetics, where a consensus tree is built after multiple iterations of generating a hierarchy using bootstrapped samples from the original data. Using bootstrapping would potentially allow us to reduce spurious variation in the underlying data and identify whether detected patterns of cluster partitioning have high stability. For example, by using this method, we can test at each depth of the tree the extent to which employer requirements are consistently grouped together. The splitting should be stopped when the resulting subclusters do not demonstrate high confidence (i.e. requirements are grouped together only 50% of times).
The second limitation is that we do not allow employer requirements to exist in multiple locations. The current hierarchical structure places skills in the cluster, to members of which it is most strongly connected. However, it is likely that certain skills, competences and knowledge such as cooking and biology, will have lateral links to other clusters. For example, cooking resides in caregiving, but can also be connected to food service in retail. Similar for biology, which is currently in pathology, but also related to education. To address the limitations of hierarchical hard partitioning of requirements, we propose to complement the provided hierarchical structure of employer requirements with a simplified graph of requirements. In this graph all the vertices will be contracted to their layer 3 clusters. The links between the 146 clusters can then be aggregated and used to explore the lateral relationships between the employer requirements.
Finally, it is not currently clear how to incorporate incoming information on job adverts. In future work, we would like to explore the advantages and disadvantages of running the analysis from the start on all available data (with new information added) as opposed to generating the word embeddings and the taxonomy on temporal slices of the data.
Conclusion
In this work we demonstrate how a taxonomy of employer requirements can be derived in a data-driven way. Using initial results of the proposed method, we show that the automatically generated taxonomy of employer requirements performs reasonably well. The taxonomy contains 4 hierarchical layers, which are identified by iteratively applying modularity optimisation community detection algorithm. The quality of the clustering is enhanced by using a word embeddings approach to capture the strength of relationships between the requirements as opposed to relying only on frequency based measure.
In addition to generating the taxonomy, we also extract useful metadata on each cluster of employer requirements, mapping relationships between skills requirements clusters and salary, occupations, and job titles. We also trial a method for determining the level of a skill specialisation by applying Gaussian Mixture Models technique using the eigenvector centrality of employer requirements.
The contributions of this work are manifold. The proposed employer requirements taxonomy represents a first non expert-driven taxonomy independent from established frameworks such as ESCO or O*NET. The taxonomy is developed automatically and identifies meaningful patterns in the employer requirements without any preconditions on how requirements should be grouped. Because of this, the taxonomy minimises the risk that interrelationships between skills are overlooked because they don’t fit a traditional view on how skills should be organised. For example, agricultural skills are usually grouped in their own separate category, while in our taxonomy, these reside in grounds maintenance, because many of the associated skills, such as using fertilisers are similar to requirement in landscaping and gardening occupations. Therefore, the taxonomy provides a unique opportunity for validating expert-derived taxonomies.
The resulting skills taxonomy as well as the algorithm for developing it and interactive data visualisation will be released to the public. We believe that these resources would benefit a wide audience and allow policymakers, educators and individuals to better understand the skill sets needed by employers and the associated salaries and job titles. The taxonomy also provides a foundation for measuring similarity of jobs/occupations based on employer requirements. These insights could then be directly applied to inform policy on reskilling and identifying job transition opportunities for occupations at risk of decline.In the future research, we plan to increase the robustness of the proposed methodology by including bootstrapping stage in the methodology to ensure stability of the resulting groups.
We will also extend the current hierarchical representation of the taxonomy into an ontology, where not just the direct, but also lateral relationships between clusters are captured. The resulting ontology can then be implemented as a graph database, accessible to the public.
We would also be interested to study the evolution of employer requirement over time using methodology described in Rosvall and Bergstrom (2010).