One more relevant example for this study arise from the work of Gao
Daim: Demonstrated that technology forecasting results for emerging technologies can be improved by combining patent-based statistics with bibliometric clustering and citation analysis techniques for the purpose of data acquisition (as a proxy indicator for technology diffusion when historical data is not present). This work subsequently coupled these patent and academic literature data-mining techniques with the use of system dynamics modelling as a means of exploring causal relationships and non-linear behaviours in technology diffusion.
Samira Ranaei: More recently, the use of text-mining approaches to improve the speed and accuracy of patent analysis methods have been demonstrated by Samira Ranaei's automatic retrieval of patent records relevant for forecasting the development of electric and hydrogen vehicles.
This is addressed in a different manner by Gao who instead utilises multiple extracted patent dimensions
 
Presumptive substitutions tend to have a long development period before take-off occurs
Reactive substitutions generally take-off 

Methodology

There are many possible methods that can be used for gauging technological development. In this study, bibliometric data has been used based on patent records as this has become a well-established means of assessment for both industry market comparisons and government policy setting purposes. An overview of the considerations taken in to account in method selection and development are discussed below.

Bibliometric data

Patent data has been sourced from the Questel-Orbit patent search platform in this analysis. More specifically, the full FamPat database was queried in this study, which groups related invention-based patents filed in multiple international jurisdictions into families of patents. Some of the core functionalities behind this search engine are outlined in \cite{Questel_Orbit_2000}. This platform is accessed by subscribers via an online search engine that allows complex patent record searches to be structured, saved, and exported in a variety of formats. A selection of keywords, dates, or classification categories are used in this search engine to build relevant queries for a given technology (this process is discussed in more detail in section \ref{108157}). The provided search terms are then matched in the title, abstract, and key content of all family members included in a FamPat record, although unlike title and abstract searches, key contents searches (which include independent claims, advantages, drawbacks, and the main patent object) are limited to only English language publications.

Statistical comparisons of time series

This study considers 23 technologies, defined in Table \ref{table:search_terms}, where literature evidence has been identified to classify the particular mode of technology substitution observed. The evidence and process used in this categorisation is outlined in detail in chapter 2 of [Insert reference to my thesis here]. Using bibliometric analysis methods it is possible to extract a variety of historical trends for any technologies of interest, effectively generating a collection of time series data points associated with a given technology (these multidimensional time series datasets are referred to here as 'technology profiles'). This raises the question of how best to compare dissimilar bibliometric technology profiles in an unbiased manner in order to investigate whether literature based technology substitution groupings can be determined using a classification system built on the assumptions given in section \ref{585124}. In particular, comparisons of technology time series can be subject to one or more areas of dissimilarity: time series may be based on different number of observations (e.g. covering different time spans), be out of phase with each other, may be subject to long-term and shorter term cyclic trends, be at different stages through the Technology Life Cycle (or be fluctuating between different stages) \cite{little1981strategic}, or be representative of dissimilar industries. As such, a body of work already exists on the statistical comparison of time series, and in particular time series classification methods \cite{lin2012pattern}. Most modern pattern recognition and classification techniques emerging from the machine learning and data science domains broadly fall within the categories of supervised, semi-supervised, or unsupervised learning approaches. Related to this, an overview of current preprocessing, statistical significance testing, classification, feature alignment, clustering, cross-validation, and functional data analysis techniques for time series is provided in Appendix A for further details of the considerations addressed in this study's methodology beyond those discussed directly in section \ref{330519}.

Method selection

Based on the technology classification problem considered, the bibliometric data available, and the methods discussed in Appendix A the following methods have been selected for use in this analysis:

Technology Life Cycle stage matching process

For those technologies where evidence for determining the transitions between different stages of the Technology Life Cycle has either not been found or is incomplete, a nearest neighbour pattern recognition approach has been employed based on the work of Gao \cite{Gao_2013} to locate the points where shifts between cycle stages occur. However, for the technologies considered in this paper, literature evidence has been identified for the transitions between stages, and so the nearest neighbour methodology is not discussed further here.

Identification of significant patent indicator groups

In order to identify those bibliometric indicator groupings that could form the basis of a data-driven technology classification model a combination of Dynamic Time Warping and the 'PAM' variant of K-Medoids clustering has been applied in this study. For the initial feature alignment and distance measurement stages of this process, Dynamic Time Warping is still widely recognised as the classification benchmark to beat (see Appendix A), and so this study does not look to advance the feature alignment processes used beyond this. Unlike the Technology Life Cycle stage matching process which is based on a well-established technology maturity model, this study is assuming that a classification system based on the modes of substitution outlined in section \ref{585124} is not intrinsically valid. For this reason an unsupervised learning approach has been adopted here to enable human biases to be eliminated in determining whether a classification system based on presumptive technological substitution is valid or not, before subsequently defining a classification rule system. In doing so this additionally means that labelling of predicted clusters can be carried out even if labels are only available for a small number of observed samples representative of the desired classes, or potentially even if none of the observed samples are absolutely defined. This is of particular use if this technique is to be expanded to a wider population of technologies, as obtaining evidence of the applicable mode of substitution that gave rise to the current technology can be a time-consuming process, and in some cases the necessary evidence may not be publicly available (i.e. if dealing with commercially sensitive performance data). As such, clustering can provide an indication of the likely substitution mode of a given technology without the need for prior training on technologies that belong to any given class. Under such circumstances this approach could be applied without the need for collecting performance data, providing that the groupings produced by the analysis are broadly identifiable from inspection as being associated with the suspected modes of substitution (this is of course made easier if a handful of examples are known, but means that this is no longer a hard requirement). The 'PAM' variant of K-Medoids is selected here over Hierarchical clustering since the expected number of clusters is known from the literature, and keeping the number of clusters fixed allows for easier testing of how frequently predicted clusters align with expected groupings. Additionally, a small sample of technologies is evaluated in this study, and as a result computational expense is not likely to be significant in using the 'PAM' variant of K-Medoids  over Hierarchical clustering approaches. It's also worth noting that by evaluating the predictive performance of each subset of patent indicator groupings independently it is possible to spot and rank commonly recurring patterns of subsets, which is not possible when using approaches such as Linear Discriminant Analysis which can assess the impact of individual predictors, but not rank the most suitable combinations of indicators.

Ranking of significant patent indicator groups

As the number of technologies considered in this study is relatively small, exhaustive cross-validation approaches provide a feasible means to rank the out-of-sample predictive capabilities of those bibliometric indicator subsets that have been identified as producing significant correlations to expected in-sample technology groupings. As such, leave-p-out cross-validation approaches are applied for this purpose, whilst also reducing the risk of over-fitting in the following model building phases \cite{Arlot_2010}.

Model building

The misalignment in time between life cycle stages relative to other technologies can make it difficult to identify common features in time series. This is primarily because this phase variance risks artificially inflating data variance, skewing the driving principal components and often disguising underlying data structures \cite{Marron_2015}. Consequently, due to the importance of phase variance when comparing historical trends for different technologies, and the coupling that exists between adjacent points in growth and adoption curves, functional linear regression is selected here to build the technology classification model developed in this study (see notes on Functional Data Analysis in Appendix A for further details).