\documentclass[10pt,twocolumn,a4]{article}

\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}


\usepackage{times}


\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
            linkcolor = blue,
            urlcolor  = blue,
            citecolor = blue,
            anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
  \errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother


\usepackage{natbib}


\renewenvironment{abstract}
  {{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
  {\bigskip}

\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}


\usepackage{authblk}


\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%

\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}

\usepackage[utf8]{inputenc}
\usepackage[english]{babel}


\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{cleveref}
\usepackage{subfig}
\usepackage{subfig,multicol}
\usepackage{multirow}
\usepackage{array}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{natbib} \bibliographystyle{plainnat}
\usepackage{tabulary}

\begin{document}

\title{Identifying the mode and impact of technological substitutions (journal
paper v2)}


\author[1]{Ian Marr}%
\author[2]{chmcm}%
\author[2]{chris.mcmahon}%
\author[2]{sanjiv.sharma}%
\author[2]{m.lowenberg}%
\author[2]{superbobandlemming}%
\affil[1]{University of Bristol}%
\affil[2]{Affiliation not available}%


\vspace{-1em}


  \date{}


\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup


\sloppy


\section{Abstract}

{\label{994700}}

The introduction of new technologies into heavily regulated industries
such as aerospace is often a very complex, time-consuming and expensive
challenge that requires significant levels of research and development
in order to ensure a successful technology substitution. This challenge
is exacerbated when new technology options represent a fundamental shift
away from well-established principles, as the risk and uncertainties
involved increase significantly. This is currently the case in the
anticipated transition from conventional turbojet aircraft architectures
to all new electric configurations, and equally for the adoption of
technologies enabling mass manufacturing and customisation processes in
aerospace production lines. At the same time, the opportunities
associated with these disruptive or sustaining innovations may be
sufficient to warrant decision-makers adopting new technological
paradigms. In some cases, new technological paradigms arise even while
existing technological paradigms are still undergoing further
developments, and have not yet reached the peak of their performance.
This further complicates the decision for enterprises, as switching to a
new technological paradigm that may or may not out-perform the old one
presents great commercial risk. In this regard it is beneficial to be
able to identify early on whether a new technological paradigm is likely
to have scope for development beyond that of the current dominant
technology, and commercially, when the tipping point might occur where
the new paradigm would become the industry `mainstream' technology
option.

This paper examines historical cases where emerging technologies have
been presumed in-advance to have development opportunities beyond those
of pre-existing technologies, subsequently leading to transitions
occurring before performance of the existing technology has stagnated.
Based on conceptual models previously published considering the mode of
technological substitution and the relation to scientific and
technological developments, this paper looks to test whether
bibliometric measures of scientific and technological development can
provide an indication of the mode of adoption likely to occur.
Bibliometric, pattern recognition, statistical and other data-driven
analysis techniques are applied to technologies identified as having
been adopted as a result of either prior technological stagnation, or as
a result of a presumptive leap being made, in order to identify early
indicators of the mode of technological substitution. This has led to
the development of a functional linear regression model that can be used
in supporting technology strategy and innovation management by
indicating the likely mode of adoption from key technology development
indicators.

\section{Introduction}

{\label{649217}}

\subsection{Technology forecasting, substitution patterns, and
technological
failure}

{\label{677399}}

Technological substitution often plays an important role in the fortunes
of modern enterprises. Correctly predicting which technologies are
likely to be most influential can ensure that a firm is best positioned
to steal a large advance over their competitors when the new technology
comes to fruition. Conversely, failure to anticipate the arrival of big
technological shifts can leave firms severely diminished. This is
illustrated by the dramatic impact on Kodak's business following the
introduction of digital photography, that rendered many of the firm's
existing film product's obsolete following an early lead in the digital
field that was not fully capitalised upon~\hyperref[csl:1]{(Lucas and Goh, 2009)}. Equally,
investing heavily in a nascent technology too soon can have grave
consequences, as Bertlesmann found from investing in
Napster~\hyperref[csl:2]{(Hall and Rosson, 2006)}. As such forecasting techniques are often
used to determine strategies in large organisations by providing an
initial guide to future opportunities, risks, challenges, \& areas of
uncertainty.

In this field, considerable work has already been undertaken on the
modelling of technology diffusion as part of these substitution events.
This has included, amongst many other areas of study, the influence of
successive technology generations, and the impact of time delays on the
perception of new technologies, as illustrated in
Fig.~{\ref{359340}} and
Fig.~{\ref{740770}} respectively.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Fig1/Fig1}
\caption{{Successive generations of technology substitutions~\protect\hyperref[csl:3]{(Bass, 2004)}
{\label{359340}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Fig2/Fig2}
\caption{{Technology S-curves and the impact of time delays on the perception of
new technologies~\protect\hyperref[csl:4]{(Datt{\'{e}}e and Weil, 2007)}
{\label{740770}}%
}}
\end{center}
\end{figure}

Classically, the introduction of new technologies is often described as
following an S-curve that assumes uptake is initially slow in the
earliest stages prior to a dominant design emerging, until performance
and functional benefits of the new technology are seen to be greater
than those of existing technologies, at which point uptake significantly
accelerates \hyperref[csl:5]{(Foster, 1986}; \hyperref[csl:6]{Utterback, 1994)}. This model assumes that eventually all
technologies then arrive at a limiting condition where they too begin to
stagnate as uptake reduces (potentially due to market saturation or
competition from new technologies), with substitution to a subsequent
generation of technologies occurring either before or after arriving at
this temporary plateau (see Fig. {\ref{740770}}). This
brings about the notion of continual technological (or functional)
failure, at the point where a replacement technology is sought for the
current technological paradigm. However, the technological `failures'
that lead to this type of substitution vary greatly, and cannot just
assume a single simple definition. In this regard, previous work has
examined what is meant by `technological failure', and has broadly
categorised these occurrences into three main
definitions~\hyperref[csl:7]{(Gooday, 1998)}:

\begin{enumerate}
\tightlist
\item
  \textbf{`Failure' as a social taxonomy of marginalised technologies:}
  `Failure' is not an essential characteristic of the technology itself.
  Instead `failure' depends on a diverse range of usage factors that may
  not be replicated in other cultures, and is chronologically bounded so
  that any given technology can be classed as a success or failure at a
  given point in time according to social responses to it. This
  definition implies that `failure' is a completely unexceptional matter
  in technology, and that all `successful' technologies `fail' at some
  point in their existence~\hyperref[csl:7]{(Gooday, 1998)}
\item
  \textbf{`Failure' as a mundane feature of technological usage and
  development:} Persistent `failure' of technology is an unavoidable
  consequence of ever more demanding expectations that human users
  impose upon their all-too-limited constructions. As such, what `fails'
  is human expectations of hardware performance and distribution - or
  rather a `failure' of socio-technical relations~\hyperref[csl:8]{(Pye, 1978}; \hyperref[csl:7]{Gooday, 1998)}
\item
  \textbf{`Failure' as a perspectival and often contested attribution:}
  many recent sociological studies of technology employ two simplifying
  assumptions; firstly that~there is a decisive closure point in history
  at which a technology is judged a `success' or a `failure', and
  secondly that at this point in time, all parties come to a decision
  that is ultimately consensual, despite being based on differing
  perceptions of the technology's social role. Both of these assumptions
  can be challenged by strong counter-arguments~\hyperref[csl:7]{(Gooday, 1998)}
\end{enumerate}

In the analysis that follows, this study focuses on the first of these
three conditions (whilst the other two are addressed to a greater extent
in separate technology adoption modelling work). Specifically, the
definition of technological failure used in this study is given as:

\begin{quote}
``A point in time at which technology performance development
stagnates/plateaus, with no further~ progressive trajectory improvements
foreseen for a significant period of time~ in comparison to the overall
technology lifecycle considered, which is subsequently followed by the
substitution of a new technology/architecture that is on a progressive
trajectory''
\end{quote}

This means that a technology has been able to reach what could be
observed to be a temporary performance limit in this condition before
substitution to a new discontinuous technology
occurs~\hyperref[csl:9]{(Schilling and Esmundo, 2009)}. This definition also follows on from the work
of Sood \& Tellis which applied a~sub-sampling approach to analyse
different types of `multiple S-curves', and subsequently concluded that
technologies tend to follow more of a step-function, with long periods
of static performance interspersed with abrupt jumps in performance,
rather than a classical S shape.~In this study, stagnation periods were
recorded where technology performance during a given sub-sample had an
upper plateau longer in duration than the immediately preceding growth
phase, whilst the subsequent jump in performance in the year immediately
after the plateau was almost double the performance during the entire
plateau~\hyperref[csl:10]{(Sood and Tellis, 2005)}. Other studies, including the work of Chang
and Schilling, classify multiple S-curves based on whether successive
curves intersect or are disconnected~\hyperref[csl:11]{(Chang and Baek, 2010}; \hyperref[csl:9]{Schilling and Esmundo, 2009)}.

\subsection{Anomalies associated with scientific and technological
crisis}

{\label{646617}}

Up till now, only substitution patterns associated with technological
failure have been discussed. However, previous studies have identified
that technological substitutions are not just the result of the existing
technology being deemed to have `failed'. In this sense Edward Constant
argued that a feature common to all technological revolutions was the
emergence of `technological anomalies', which could be traced to either
scientific or technological crisis. The first, and most common cause of
these technological anomalies results from functional failure, where:

\begin{quote}
``either the conventional paradigm proves inappropriate to ''new or more
stringent conditions``, or an individual intuitively assumes that (s)he
can produce a better or a new technological device''~\hyperref[csl:12]{(II, 1973)}
\end{quote}

Alternatively, technological anomalies can arise as a result of
presumptive technological leaps:

\begin{quote}
``The demarcation between~functional-failure anomaly and presumptive
anomaly is that presumptive anomaly is deduced from science before a new
paradigm is formulated and that scientific deduction is the sole reason
for the sole guide to new paradigm creation. No functional failure
exists; an anomaly is presumed to exist, hence presumptive
anomaly''~\hyperref[csl:12]{(II, 1973)}
\end{quote}

Whilst technological revolutions may originate from either scientific or
technological crisis, a critical area of commonality lies in the
anomaly-crisis process observed in both conditions:

\begin{quote}
``in both science and technology anomaly causes certain individuals to
reject the conventional paradigm and to create new paradigms, and, in
each, crisis may lead to revolution''~\hyperref[csl:12]{(II, 1973)}
\end{quote}

The type of crisis that emerges is dependent on which type of anomaly
precedes it. Scientific crisis can occur irrespective of whether an
alternative theoretical framework exists or not when a persistent,
unresolved, scientific anomaly successfully refutes an established
theory. In this condition the crisis is directly linked to the anomaly.
However, technological anomaly and crisis are rarely so logically
driven, and can~arise~in conditions where existing technological
paradigms are still performing favourably. This is illustrated by the
turbojet revolution of the 1930s and 1940s where piston-engine
developments had provided remarkable performance improvements and
continuing success, but were superseded by scientific advances that were
directly responsible for the radical technological changes that
followed. In addition, in order for a technological anomaly to provoke a
technological crisis, a convincing alternative paradigm must exist, so
that the relative functional failure of the conventional system is
observable. As such, the alternative technological paradigm instigates
the crisis, whilst the technological anomaly may only be seen as
speculation or as a limiting condition to the normal
technology~\hyperref[csl:12]{(II, 1973)}.

\subsection{Modes of substitution}

{\label{585124}}

Based on the definitions of functional failure and presumptive anomaly
described in sections~{\ref{677399}}
and~{\ref{646617}}, this study examines the ability to
distinguish between these two modes of substitution (i.e. reactive or
presumptive) from analysis of historical scientific and technological
data. Table~{\ref{table:technology_categories}} uses
these definitions and performance evidence obtained from literature to
classify a sample set of technologies according to the mode of
substitution observed.\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{9cm}|p{9cm}}
    {Examples of  technological failure} & {Examples of presumptive anomaly} \\ \midrule
    Plug-compatible market (PCM) disk drives \hyperref[csl:13]{(Christensen and Rosenbloom, 1995)} & Transition from piston engine to jet engine \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to fibre optic cables from Cu/Al wires for data transfer \hyperref[csl:10]{(Sood and Tellis, 2005)} & Transition to optical undersea cables from coaxial cables \hyperref[csl:11]{(Chang and Baek, 2010)} \\
    \hline
    Transition to Low Pressure Sodium lights from Tungsten Filament Lamps \hyperref[csl:11]{(Chang and Baek, 2010)} & Hydrodynamics, water turbines, and turbine pumps \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to Compact Fluorescent Lamps from Tungsten Filament Lamps \hyperref[csl:11]{(Chang and Baek, 2010)} & Thermodynamics, steam, and early gas engines \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to White LED lighting from Low Pressure Sodium and Compact Fluorescent Lamps \hyperref[csl:11]{(Chang and Baek, 2010)} & Organic chemistry and catalytic petroleum cracking \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to hypersonic aircraft from supersonic \hyperref[csl:11]{(Chang and Baek, 2010)} & Transition to the transistor from the vacuum tube \hyperref[csl:14]{(Foster, 1985)} \\
    \hline
    Transition to coaxial undersea cables from single cable \hyperref[csl:11]{(Chang and Baek, 2010)} & Nuclear physics and atomic energy \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to T-carrier system from modem internet access \hyperref[csl:11]{(Chang and Baek, 2010)} & Renewable energy sources \\
    \hline
    Transition to Synchronous Optical Networking (SONET) system from T-carrier internet access \hyperref[csl:11]{(Chang and Baek, 2010)} & Electric vehicles \\
    \hline
    Transition to ink jet and laser printers from dot matrix printers \hyperref[csl:10]{(Sood and Tellis, 2005)} &  \\
\end{tabular}%}
\caption{{Identified examples of technological failure and presumptive anomaly}}
\label{table:technology_categories}
\end{table*}

In addition to the modes of substitution outlined in
Table~{\ref{table:technology_categories}}, other
technologies have been identified as `non-starters': these are
technologies that were never mass commercialised. In many cases these
technologies could have been adapted for the target markets considered
but were either never used or failed to demonstrate the required
features,~or performance and cost improvements necessary to warrant
further development beyond initial trials. Examples of non-starter
technologies include wire recorders as an alternative to magnetic tape
technology and chain printers as an alternative to dot matrix printers.
In the case of wire recorders, this format failed to take-off after it
was~excluded from the standard-setting process in favour of magnetic
tape technology, leading to ``technological lock-out'', whilst early
chain printers were quickly eclipsed by the superior performance of the
dot matrix design. Non-starters are excluded in this study, as the
analysis that follows is based on classifying individual technologies
based on technologies that are known to have been successfully
commercialised, and as such~it is not believed their inclusion would
influence the results presented here, although non-starters would need
to be included for reducing uncertainty in the classification of
emerging technologies~\hyperref[csl:10]{(Sood and Tellis, 2005)}.

Based on Constant's hypothesis regarding scientific and technological
anomalies and their influence on the mode of technological substitution,
this paper looks to test whether bibliometric measures of scientific and
technological development can provide an indication of the mode of
adoption likely to occur.~Consequently, this study theorises that in
order to identify cases of technological substitution arising from
presumptive anomaly a classification scheme would need to consider:

\begin{enumerate}
\tightlist
\item
  a population's perception of the current rate of scientific
  development in observed domains~\hyperref[csl:12]{(II, 1973)}
\item
  a population's perception of~the current rate of technological
  development in observed domains~\hyperref[csl:12]{(II, 1973)}
\end{enumerate}

\subsection{Measuring perceptions of limits of science and
technology}

{\label{870633}}

Many indicators of science and technological progress have been
developed in the fields of bibliometrics and scientometrics in recent
decades. Whilst these have largely been developed~for the purposes of
identifying and targeting gaps in existing knowledge, as well as for
determining the effectiveness of funding in specific fields of research,
they also provide a systematic approach to compare development trends
across a broad range of scientific domains. When attempting to measure
science it is however important to ensure that any measurements taken
are suitable indicators of the development characteristics that are
being studied. In this regard conceptual distinctions exist between
scientific activity, scientific production, and~ scientific
progress~\hyperref[csl:15]{(Martin, 1996)}:

\begin{enumerate}
\tightlist
\item
  \textbf{Scientific activity:} consumption of the inputs to basic
  research (e.g. related to the number of scientists involved, level of
  funding, support staff and equipment)
\item
  \textbf{Scientific~ production:} extent to which consumption of
  resources creates a body of scientific results. Results are embodied
  both in research publications and in other types of less formal
  communication between scientists
\item
  \textbf{Scientific~ progress:} extent to which scientific activity
  results in substantive contributions to scientific knowledge
\end{enumerate}

Based on this, indicators of scientific progress, such as citation
analysis, are normally considered most appropriate for assessing
scientists' success in producing new scientific knowledge and for
identifying emerging areas of development, leading to their common usage
in the tenure review process~\hyperref[csl:16]{(Narin and Hamilton, 1996)}. At the same time,
simple publication counts are considered to provide a reasonable measure
of scientific production, but are thought to be much less adequate as an
indicator of contributions to scientific progress due to the unclear
value of each publications individual contribution to knowledge.
Publication counts actually reflect both the level of scientific
progress made by an individual or group, as well as a number of other
factors relating to the social and political pressures behind a study
(e.g. publication practices of the employing institution, country and
research area, or emphasis placed on publications for obtaining
promotion or grants)~\hyperref[csl:17]{(Verbeek et al., 2002}; \hyperref[csl:15]{Martin, 1996)}. Realistically these other
extraneous factors cannot be assumed to be small in comparison to the
scientific claims made, or that these effects are randomly distributed
and cancel out~\hyperref[csl:15]{(Martin, 1996)}. However in this study, the emphasis
is not on assessing the performance or influence of a specific set of
papers, but rather to gauge the adoption of the field as a whole. As
technology diffusion models also rely on non-invested parties being made
aware of scientific and technological progress, communication and
promotion of scientific research are important factors to include in
adoption processes~\hyperref[csl:3]{(Bass, 2004)}. Adoption is equally dependent on
perceptions of current scientific and technological rates of
progress~(shaped by social and political pressures, as well as
technical), rather than the actual rates of progress (shaped by
technical contributions to knowledge). Lastly, diffusion effects are
population size, word-of-mouth, and time
dependent~\hyperref[csl:3]{(Bass, 2004)}.~As a result, measures of scientific
production are felt to be a more relevant~ indication of likelihood to
adopt than measures of scientific progress, although they could also
indicate a potentially contentious or controversial topic that is
generating lots of different opinions. However, controversy does not
necessarily prevent adoption, and in some cases may accelerate
substitution mechanisms~\hyperref[csl:2]{(Hall and Rosson, 2006)}. Consequently, for the
purposes of this study the scientific production associated with debate
over contentious or controversial technologies is not believed to
significantly skew the trends presented here in either direction away
from the intended simplified reflection of real-world adoption
characteristics.

\section{Methodology}

{\label{274383}}

\subsection{Statistical comparisons of time
series}

{\label{204737}}

This study considers 23 technologies where literature evidence has been
identified to classify the particular mode of technology substitution
observed. Using bibliometric analysis methods it is possible to extract
a variety of historical trends for any technologies of interest,
effectively generating a collection of time series data points
associated with a given technology (these multidimensional time series
datasets are referred to here as `technology profiles'). This raises the
question of how best to compare dissimilar bibliometric technology
profiles in an unbiased manner in order to investigate whether
literature based technology substitution groupings can be determined
using a classification system built on the assumptions given in
section~{\ref{585124}}. In particular comparisons of
technology time series can be subject to one or more areas of
dissimilarity: time series may be based on different number of
observations (e.g. covering different time spans), be out of phase with
each other, may be subject to long-term and shorter term cyclic trends,
be at different stages through the Technology Life Cycle (or be
fluctuating between different stages)~\hyperref[csl:18]{(Little, 1981)}, or be
representative of dissimilar industries. As such, a body of work already
exists on the statistical comparison of time series, and in particular
time series classification methods~\hyperref[csl:19]{(Lin et al., 2012)}. Most modern time
series pattern recognition and classification techniques emerging from
the machine learning and data science domains broadly fall within the
categories of supervised, semi-supervised, or unsupervised learning
approaches.

\subsubsection{Preprocessing and statistical significance testing of
time series
classifications}

{\label{606353}}

Beyond the principal methods of classification outlined in
\hyperref[csl:19]{(Lin et al., 2012)}, the preprocessing of time series datasets and means
of statistical significance testing must also be considered.
Preprocessing of data in particular is still an area that divides
opinion within the statistics community, with some experts arguing that
transformation, smoothing, and normalisation of datasets is required for
unbiased time series comparisons, whilst others contend that in doing so
a lot of information is removed that could otherwise be captured in
error terms and that correlations may be over-stated~\hyperref[csl:20]{(Lucero and Koenig, 2000}; \hyperref[csl:21]{Ramsay et al., 2009}; \hyperref[csl:22]{``{Smoothing Data, Filling Missing Data, and Nonparametric Fitting}'', n.d.}; \hyperref[csl:23]{``{When and why do we need data normalization?}'', 2013}; \hyperref[csl:24]{``{Smoothing - when to use it and when not to?}'', 2015)}.
~If focusing on long-term trends it is often recommended that analysis
is based on either logarithms or inverse hyperbolic sine transformations
of time series data rather than raw data in order to reduce focus on
short cyclic features~\hyperref[csl:25]{(Ramsay, 2013}; \hyperref[csl:26]{Nau, n.d.}; \hyperref[csl:27]{Hyndman, 2010}; \hyperref[csl:28]{``{Log transformation of values that include 0 (zero) for statistical analyses?}'', 2014)}. Similarly, simple moving
averages are thought to be more appropriate than exponential smoothing
for long-term trends if smoothing is to be applied~\hyperref[csl:29]{(Twomey, n.d.)}.

A key data preparation requirement considered in this analysis relates
to the definition of shared curve features from bibliometric data that
can be used to address the time series and Technology Life Cycle
alignment issues highlighted in section~{\ref{204737}}.
These feature recognition and alignment processes are required to enable
fair comparisons and classification to be based on dissimilar
technologies. To ensure consistency, feature recognition processes
should consider the relative height of plateaus observed between
technology profiles from different industries, the rates of growth
observed in the early stages of historical trends, and the influence of
noise and incomplete time series data on the classifications being made.
For these reasons it is assumed that unsmoothed, amplitude normalised,
time series which are subsequently segmented based on common curve
features would enable these comparisons to be made. This approach would
ensure that all curve amplitudes considered are relative on a global
scale, whilst segmentation based on common features would enable
consistency in defining early growth phases whilst allowing later
incomplete segments to be discarded from classifications. As a basis for
these feature extraction stages it is assumed that the Technology Life
Cycle model proposed by Little provides a well-established concept and a
sensible candidate for identification of common curve
features~\hyperref[csl:18]{(Little, 1981)}. However, identified curve features may
still be unaligned in time, and consequently time transformation
techniques, such as `time warping' methods, are also recommended (this
is discussed in more detail in section {\ref{367729}}).

In terms of being able to determine correlations between groups of time
series datasets the Chi-square statistic is commonly used to test the
independence of descriptive statistics derived from time series (time
series classifiers are discussed in more detail in
section~{\ref{367729}}). However, as a consequence of
the probability distribution function used in its significance test the
Chi-squared approach is best suited to confusion matrices (i.e.
cross-tabulated comparisons of predicted classifications against target
classifications) which have all cell values being greater than or equal
to five. As such, when smaller sample sizes are considered (such as the
23 technologies considered in this analysis), Fisher's exact test is
more appropriate. In a similar fashion to the Chi-square test, Fisher's
exact test is able to determine the significance of outcomes for samples
taken at random from a population, but is not necessarily able to
provide a ranking of the most statistically robust predictors (i.e.
predictors that are likely to be accurate when considering out-of-sample
predictions). It is worth noting that in this analysis technologies have
been deliberately selected based on their observed performance trends,
and as such Fisher's exact test cannot be used to reject the null
hypothesis (as samples are not being taken at random from a
population)~unless known time series classification labels are removed
so that clustering is not based on human biases (i.e. unsupervised
learning approach). For subsequent ranking of predictors based on small
sample sizes, cross-validation approaches are then required (discussed
in more detail in section {\ref{258858}}). Histograms
can also prove useful for determining the most frequently occurring
individual factors in these cross-validation `bootstrapping' processes,
but cannot indicate what combination of factors would work best
together.

\subsubsection{Time series classification and feature alignment
techniques}

{\label{367729}}

In order to identify and rank the predictive ability of different
combinations of bibliometric indicators when used for classification
purposes, an appropriate classifier first has to be selected that fits
the data features being considered. In this sense time series
classification procedures can be grouped based on the type of
discriminatory features the techniques are attempting to find, as
outlined in~\hyperref[csl:30]{(Bagnall et al., 2016)}. This recent benchmarking analysis has
found that few time series classification algorithms perform better than
the Dynamic Time Warping and Rotation Forest benchmark classifiers,
whilst the best alternative (COTE) was identified as being hugely
computationally expensive~\hyperref[csl:30]{(Bagnall et al., 2016)}. It's worth noting that
feature alignment techniques that calculate relative feature-based
distance measures between time series (such as whole series and interval
approaches) can be used to calculate single value representations of the
similarity between any given pair of time series, including complex time
series with multiple dimensions, which can subsequently be used in
further clustering or wider classification analysis.

In the case of Dynamic Time Warping feature alignment is achieved by
stretching portions of two signals,~\emph{X} and~\emph{Y}, onto a shared
set of instances such that a global signal-to-signal distance measure is
minimised. The set of distortion paths used in this minimisation problem
are based on a lattice of all possible distances between
the~\emph{m}\textsuperscript{th} data point of~\emph{X} and
the~\emph{n}\textsuperscript{th} data point of~\emph{Y}. Valid warping
paths, parameterised by two sequences of the same length, are a
combination of ``chess king'' moves which completely aligns the signal,
does not skip any data points, and does not repeat any signal
features.~In determining the path with minimum warping path the
algorithm forces similar features to appear at the same location on a
common time axis~\hyperref[csl:31]{(MathWorks, 2016)}.

\subsubsection{Time series clustering
techniques}

{\label{685275}}

As a form of unsupervised learning, clustering approaches enable
associations between time series to be identified without being
subjected to human grouping biases. However, in order to apply
clustering techniques it is necessary to be able to describe the
relationships between successive pairs of time series using single value
representations. Consequently time series clustering techniques tend to
be based on measures of the relative distance between curves, rather
than the curve data points themselves. There is also considerable
variation in the outcomes depending on the clustering algorithm selected
for use. This can be in terms of the real-world interpretation of the
groupings generated, as observed when comparing clusters predicted using
the K-means and K-medoids algorithms.
Fig.~{\ref{943632}}~illustrates how the centre of
subsets in K-means is equivalent to the mean of measurements in the
subset (the centroid), rather than an actual member of the subset (a
medoid). As such K-means is not appropriate for application to time
series, as the algorithm ends up minimising variance, rather than
distances between curves \hyperref[csl:32]{(MathWorks, 2016}; \hyperref[csl:33]{``{Dynamic Time Warping Clustering}'', 2015)}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.84\columnwidth]{figures/K-Medoids-vs-K-Means-2/K-Medoids-vs-K-Means-2}
\caption{{Differences in real-world interpretations of K-means and K-medoids
clustering algorithms
{\label{943632}}%
}}
\end{center}
\end{figure}

Besides predicting alternative central points for subsets, and
consequently grouping alternative subset members, the number of clusters
predicted can also vary depending on the algorithm selected. Whereas
K-means and K-medoids require the number of clusters to be specified in
advance, hierarchical clustering approaches automatically determine the
number of clusters to group data points into without additional human
intervention~\hyperref[csl:32]{(MathWorks, 2016}; \hyperref[csl:33]{``{Dynamic Time Warping Clustering}'', 2015)}. Furthermore, as a form of unsupervised
learning, clustering approaches will provide different group labels to
subsets each time they are applied, even if the actual subset members
remain unchanged, so a separate~`subset~mapping' function based on
`Hamming distance' is required to ensure consistency in comparisons
between generated clusters and expected groupings. Once again it is also
worth noting that the definition of subsets using any clustering
technique will only be valid if time series are being compared on
comparative features rather than incomplete time series data. As such,
time series segmentation based on shared features or imputation of
missing data are again prerequisites for meaningful analysis, ensuring
that only completed segments are used in defining subsets. Finally, if
using feature-based distance measures as the basis for clustering
(grouped into matrices of distance points relating each technology time
series to every other time series)~then it is generally suggested that
either hierarchical clustering or the `Partitioning Around Medoids'
(PAM) variant of K-Medoids are applied to the descriptive
data~\hyperref[csl:32]{(MathWorks, 2016}; \hyperref[csl:33]{``{Dynamic Time Warping Clustering}'', 2015)}.

\subsubsection{Cross-validation
techniques}

{\label{258858}}

To assess the predictive performance of any given combination of
bibliometric indicators in practice it is necessary to determine how
the~classification results will generalise to an independent (i.e.
unknown) data set. For this purpose, cross-validation techniques are
commonly employed to provide an indication of model validity when
considering out-of-sample predictions. This is accomplished by
sequentially~training and then generating test predictions from
different subset decompositions of the original data, and using the
average number of misclassified observations as a means to rank each
predictor grouping. In doing so, cross-validation helps to address the
risk of over-fitting models that are based on limited sample sizes, but
equally provides a means to identify the most suitable predictor
groupings to use for model building purposes based on their robustness
to misclassifications. Cross-validation techniques are generally grouped
into either exhaustive or non-exhaustive categories, as shown in Table
{\ref{table:cross-validation_techniques}}.\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{8cm}|p{8cm}}
    {Exhaustive cross-validation approaches} & {Non-exhaustive cross-validation approaches} \\ \midrule
    \hline
    Leave-p-out cross-validation & k-fold cross-validation \\
    \hline
    Leave-one-out cross-validation (most computationally inexpensive version of leave-p-out cross-validation) & Holdout method \\
    \hline
     & Monte Carlo (repeated random sub-sampling) \\
\end{tabular}%}
\caption{{Common cross-validation techniques}}
\label{table:cross-validation_techniques}
\end{table*}

Some known limitations have to be taken into consideration when applying
cross-validation techniques. In particular, cross-validation
approaches~only yield meaningful results if the validation set and
training set used are drawn from the same population (without overlap
between sets), and if human biases are controlled. For example, it is
unrealistic to treat data as being drawn from the same population when
using dissimilar time periods for validation and training sets, as this
shift in time will introduce systematic differences into the sets being
considered. As such, alignment of features to ensure consistency is
again advisable for fair comparisons of time series. Similarly, training
models based on a specific group of a population~(e.g. young people),
does not enable generalisation of cross-validated training results to
the wider population as predictions could differ greatly to actual
results.

\subsubsection{Functional data analysis}

{\label{419943}}

Most statistical analysis techniques assume that the data points being
evaluated are unrelated, and can be treated as independent entities.
This is not generally true of time series, where there is often a
derivative function that connects adjoining data points together. To
address these scenarios, functional data analysis approaches were
developed to enable statistical analysis and model construction based on
whole functions rather than a collection of independent data points,
making these approaches well suited to time series
data~\hyperref[csl:21]{(Ramsay et al., 2009)}. Additionally, functional data analytics has
proved to be suitable for conditions where phase variations are present
in data (such as in growth data and historical trends where curves start
at different times/stages). Methods such as nonlinear mixed models,
repeated measure ANOVA, and principal components analysis do not
consider these differences in timing~\hyperref[csl:34]{(``{When/where to use functional data analysis?}'', 2012)}.

Functional data approaches are built on the principal of using `basis
functions' to represent data series as a `functional data
object'~\hyperref[csl:21]{(Ramsay et al., 2009)}. Basis functions are defined by:

\begin{equation}
\label{eq:basis_function_1}
    f\left(t\right)=\Sigma \beta _ib_i\left(t\right)
\end{equation}

where~\(b_i\left(t\right)\)~are known values, and~\(\beta_i\) are
the estimated coefficients. This is often also written as:

\begin{equation}
\label{eq:basis_function_2}
    f\left(t\right)=a_1\theta _1\left(t\right)+a_2\theta _2\left(t\right)+...+a_k\theta _k\left(t\right)
\end{equation}

Functional data objects can subsequently be used in functional linear
regression analysis, in an analogous way to conventional linear
regression:

\begin{equation}
\label{eq:linear_regression}
    y = X \beta + \varepsilon
\end{equation}
where
\[
    y = \begin{pmatrix}
            y_1 \\
            y_2 \\
            \vdots \\
            y_n
        \end{pmatrix},
    X = \begin{pmatrix}
            x_1^T \\
            x_2^T \\
            \vdots \\
            x_n^T
        \end{pmatrix}
    =   \begin{pmatrix}
            \begin{bmatrix}
                x_{11} \cdots x_{1p} \\
                x_{21} \cdots x_{2p} \\
                \vdots \ddots \vdots \\
                x_{n1} \cdots x_{np} \\
            \end{bmatrix}
        \end{pmatrix},
\]

\[
    \beta = \begin{pmatrix}
                \beta_1 \\
                \beta_2 \\
                \vdots \\
                \beta_p
            \end{pmatrix},
    \varepsilon =   \begin{pmatrix}
                        \varepsilon_1 \\
                        \varepsilon_2 \\
                        \vdots \\
                        \varepsilon_n
                    \end{pmatrix}
\]

The exact definition of basis functions used in functional data objects
depends closely on the type of data or feature that functional data
objects are looking to replicate. At their most basic level Fourier
series are commonly used for periodic and near periodic data (such as
for weather data and some economic data), whilst spline-based functions
are used for non-periodic
data~\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. Beyond these higher level distinctions, polynomial, B-spline
(which are essentially built up of many polynomial sections), and
wavelet functions can also be considered, with B-splines found to be
better suited to fitting highly curvy data (where polynomials would
require a large number of basis functions to achieve the same degree of
fit - as such splines have largely replaced polynomials now). Wavelets
have been observed to be very good at capturing sharp edges, which is a
particular weakness of Fourier based
functions~\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. If using B-splines, it is necessary to first define the number
of `knots' that should be used in the representation of a curve (i.e.
the joining points linking adjacent polynomial segments in the spline).
Setting the number of knots equivalent to the total number of
observations in a time series keeps this definition simple, although may
again result in a large number of basis functions depending on the
length of the series considered.

With regards to best implementation practices for functional data
analysis, recommendations have been presented in the work of Ramsay that
should be considered if looking to apply these techniques. Firstly, it
is advised that the order of B-spline functions be at least four orders
of magnitude larger than the highest order derivative to be considered
in any analysis, in order to properly capture any significant influences
from derivative
behaviours~\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. Another important point raised in this literature is the need to
scale time vectors appropriately as required so that the time period of
each basis function is not significantly less than 1, otherwise rounding
errors can become an issue when large number of basis functions are
used~\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. It's also worth noting at this point that the analysis that
follows assumes that the resampling of time series based on simple
linear interpolation in order to ensure that a consistent number of
observations is used across technologies being compared will not
introduce significant errors into the assessment of the predictive
ability of different bibliometric indicator groups. In terms of
compatibility with feature alignment techniques, the work of Ramsay
provides well-documented evidence from the studies conducted previously
of how feature alignment processes (also referred to as `landmark
registration') often form a prerequisite to model building using
functional data approaches. As such, time series segmented and aligned
based on features, such as aligning technologies against common
Technology Life Cycle stages, have been shown to enable a single data
object to be generated for multiple curves that originally spanned
across time periods of different lengths
\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. Lastly, in applying functional data analysis techniques to other
examples of growth curves (such as the U.S. Nondurable Goods Index),
Ramsay advocates the use of data transformation and smoothing in order
to be able to focus on long-term trends rather than periodic or seasonal
patterns~\hyperref[csl:25]{(Ramsay, 2013}; \hyperref[csl:35]{Ramsay, 2013)}.

\subsection{Method selection}

{\label{655650}}

Based on the technology classification problem considered, the
bibliometric data available, and the methods discussed in
sections~{\ref{204737}}~to~{\ref{419943}}~the
following methods have been selected for use in this analysis:

\subsubsection{Technology Life Cycle stage matching
process}

{\label{637710}}

For those technologies where evidence for determining the transitions
between different stages of the Technology Life Cycle has either not
been found or is~incomplete, a nearest neighbour pattern recognition
approach has been employed based on the work of
Gao~\hyperref[csl:36]{(Gao et al., 2013)}~to locate the points where shifts between cycle
stages occur. However, for the technologies considered in this paper,
literature evidence has been identified for the transitions between
stages, and so the nearest neighbour methodology is not discussed
further here.

\subsubsection{Identification of significant patent indicator
groups}

{\label{688227}}

In order to identify those bibliometric indicator groupings that could
form the basis of a data-driven technology classification model a
combination of Dynamic Time Warping and the `PAM' variant of K-Medoids
clustering has been applied in this study. For the initial feature
alignment and distance measurement stages of this process, Dynamic Time
Warping is still widely recognised as the classification benchmark to
beat (see section~{\ref{367729}}), and so this study
does not look to advance the feature alignment processes used beyond
this. Unlike the Technology Life Cycle stage matching process which is
based on a well-established technology maturity model, this study is
assuming that a classification system based on the modes of substitution
outlined in section~{\ref{585124}} is not intrinsically
valid. For this reason an unsupervised learning approach has been
adopted here to enable human biases to be eliminated in determining
whether a classification system based on presumptive technological
substitution is valid or not, before subsequently defining a
classification rule system. In doing so this additionally means that
labelling of predicted clusters can be carried out even if labels are
only available for a small number of observed samples representative of
the desired classes, or potentially even if none of the observed samples
are absolutely defined. This is of particular use if this technique is
to be expanded to a wider population of technologies, as obtaining
evidence of the applicable mode of substitution that gave rise to the
current technology can be a time-consuming process, and in some cases
the necessary evidence may not be publicly available (i.e. if dealing
with commercially sensitive performance data). As such, clustering can
provide an indication of the likely substitution mode of a given
technology without the need for prior training on technologies that
belong to any given class. Under such circumstances this approach could
be applied without the need for collecting performance data, providing
that the groupings produced by the analysis are broadly identifiable
from inspection as being associated with the suspected modes of
substitution (this is of course made easier if a handful of examples are
known, but means that this is no longer a hard requirement). The `PAM'
variant of K-Medoids is selected here over Hierarchical clustering since
the expected number of clusters is known from the literature, and
keeping the number of clusters fixed allows for easier testing of how
frequently predicted clusters align with expected groupings.
Additionally, a small sample of technologies is evaluated in this study,
and as a result computational expense is not likely to be significant in
using the `PAM' variant of K-Medoids ~over Hierarchical clustering
approaches. It's also worth noting that by evaluating the predictive
performance of each subset of patent indicator groupings independently
it is possible to spot and rank commonly recurring patterns of subsets,
which is not possible when using approaches such as Linear Discriminant
Analysis which can assess the impact of individual predictors, but not
rank the most suitable combinations of indicators.

\subsubsection{Ranking of significant patent indicator
groups}

{\label{730553}}

As the number of technologies considered in this study is relatively
small, exhaustive cross-validation approaches provide a feasible means
to rank the out-of-sample predictive capabilities of those bibliometric
indicator subsets that have been identified as producing significant
correlations to expected in-sample technology groupings. As such,
leave-p-out cross-validation approaches are applied for this purpose,
whilst also reducing the risk of over-fitting in the following model
building phases.

\subsubsection{Model building}

{\label{805423}}

Due to the importance of phase variance when comparing historical trends
for different technologies, and the coupling that exists between
adjacent points in growth and adoption curves, functional linear
regression is selected here to build the technology classification model
developed in this study (see section~{\ref{419943}}).

\subsection{Method limitations}

{\label{518077}}

Although precautions have been taken where available to ensure that the
methods selected for this study address the problem posed of building a
generalised technology classification model based on bibliometric data
in as rigorous a fashion as possible, there are some known limitations
to the methods used in this work that must be recognised. Many of the
current limitations stem from the fact that in this analysis
technologies have been selected based on where evidence is obtainable to
indicate the mode of adoption followed. As such the technologies
considered here do not come from a truly representative cross-section of
all industries, so it is possible that models generated will provide a
better representation of those industries considered rather than a more
generalisable result. This evidence-based approach also means that it is
still currently a time-consuming process to locate the necessary
literature material to be able to support classifying technology
examples as arising based on one mode of substitution or another, and to
then compile the relevant cleaned patent datasets for analysis. As a
result only a relatively limited number of technologies have been
considered in this study, which should be expanded on to increase
confidence in the findings produced from this work. This also raises the
risk that~clustering techniques may struggle to produce consistent
results based on the small number of technologies considered.
Furthermore, any statistical or quantitative methods used for modelling
are unlikely to provide real depth of knowledge beyond the detection of
correlations behind patent trends when used in isolation. Ultimately
some degree of causal exploration, whether through case study
descriptions, system dynamics modelling, or expert elicitation will be
required to shed more light on the underlying influences shaping
technology substitution behaviours. Other data-specific issues that
could arise relate to the use of patent searches in this analysis and
the need to resample data based on variable length time series. The
former relates to the fact that patent search results and records can
vary to a large extent based on the database and exact search terms
used, however overall trends once normalised should remain consistent
with other studies of this nature. The latter meanwhile refers to the
fact that functional linear regression requires all technology case
studies to be based on the same number of time samples, and as such, as
discussed in section~{\ref{419943}}, linear
interpolation is used as required to ensure consistency on the number of
observations whilst possibly introducing some small errors which are not
felt to be significant.

\subsection{Bibliometric data}

{\label{521682}}

Patent data has been sourced from the Questel-Orbit patent search
platform in this analysis. More specifically, the full FamPat database
was queried in this study, which groups related invention-based patents
filed in multiple international jurisdictions into families of
patents.~This platform is accessed by subscribers via an online search
engine that allows complex patent record searches to be structured,
saved, and exported in a variety of formats. A selection of keywords,
dates, or classification categories are used in this search engine to
build relevant queries for a given technology (this process is discussed
in more detail in~section {\ref{108157}}). The provided
search terms are then matched in the title, abstract, and key content of
all family members included in a FamPat record, although unlike title
and abstract searches, key contents searches (which include independent
claims, advantages, drawbacks, and the main patent object) are limited
to only English language publications. Some of the core functionalities
behind this search engine are outlined in \hyperref[csl:37]{(Lambert, 2000)}.

\section{Building a technology classification model from Technology Life
Cycle
features}

{\label{806047}}

\subsection{Patent indicator
definitions}

{\label{971055}}

The work of Gao identifies a range of studies that have been conducted
previously based on the principle of using either a single or multiple
bibliometric indicators as a means of investigating technological
development and performance~\hyperref[csl:36]{(Gao et al., 2013)}. Their review of these
methods concluded that multiple patent indicators are required to avoid
generating potentially unreliable results if just using a single
indicator extracted from patent data. As such, the nearest neighbour
classification process developed in Gao's study proposes the use of
thirteen separate patent indicators. This current study has accordingly
reproduced these metrics were possible, resulting in a total of ten
patent indicators (i.e. producing time series for each technology with
ten dimensions), as three of the previous list of indicators were
specific to the Derwent Innovation Index \hyperref[csl:38]{(``{Derwent innovations index version 4.0 offers expanded functionality}'', 2003)} which was
not used in this study due to the limited ability to bulk export the
necessary results from this database. As such,
Table~{\ref{table:bibliometric_indicators}}~summarises
the bibliometric indicators extracted for each technology within this
analysis.\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{2cm}|p{3cm}|p{11cm}}
    {Indicator No.} & {Name} & {Description} \\ \midrule
    1 & Application & Number of patents in Questel-Orbit by application year \\
    \hline
    2 & Priority & Number of patents in Questel-Orbit by priority year \\
    \hline
    3 & Corporate & Number of corporates in Questel-Orbit by priority year \\
    \hline
    4 & Non-corporate & Number of non-corporates in Questel-Orbit by priority year \\
    \hline
    5 & Inventor & Number of groups of inventors in Questel-Orbit by priority year \\
    \hline
    6 & Literature citation & Number of backward citations to literature in Questel-Orbit by priority year \\
    \hline
    7 & Patent citation & Number of backward citations to patents in Questel-Orbit by priority year \\
    \hline
    8 & IPC & Number of IPCs (4-digit) in Questel-Orbit by priority year \\
    \hline
    9 & IPC top 5 & Number of patents of top 5 IPCs in Questel-Orbit by priority year \\
    \hline
    10 & IPC top 10 & Number of patents of top 10 IPCs in Questel-Orbit by priority year \\
\end{tabular}%}
\caption{{Bibliometric indicators used in this study (based on the work of Gao [Gao 2013])}}
% \caption{Bibliometric indicators used in this study (based on the work of Gao \hyperref[csl:36]{(Gao et al., 2013)}}
\label{table:bibliometric_indicators}
\end{table*}

With the main exception of the use of the Questel-Orbit FamPat database
instead of the Derwent Innovation Index, the indicator definitions and
assumptions used in this study are otherwise consistent with those
outlined in sections 2.1.1 to 2.1.5
of~\href{https://www.authorea.com/users/161287/articles/182044-identifying-the-mode-and-impact-of-disruptive-innovations-journal-paper\#Gao_2013}{(Gao
2013)}. The only other notable difference to be recorded is that the
Questel-Orbit patent records are not automatically given a designation
as being a corporate, non-corporate, on individual patent assignee. As
such, the counts of corporate and non-corporate indicators (which would
otherwise be based on this assignee designation) are determined instead
based on the `Family Normalized Assignee Name' field available in the
patent records, as records with entries in this field correspond to
corporate designations.

\subsection{Search strategy and terms for identifying relevant patent
profiles}

{\label{108157}}

Previous bibliometric studies have explored the many different ways in
which patent records can be correctly identified for a given field or
topic~\hyperref[csl:39]{(Verbeek et al., 2002}; \hyperref[csl:40]{Schmoch, 1997}; \hyperref[csl:41]{Albino et al., 2014}; \hyperref[csl:42]{Rizzi et al., 2014}; \hyperref[csl:43]{Mao et al., 2015}; \hyperref[csl:44]{Dong et al., 2012}; \hyperref[csl:45]{WIPO, 2009}; \hyperref[csl:46]{Helm et al., 2014)}. Whilst filtering of search results based on
technology classification categories is generally preferred where
possible to ensure a more rigorous search strategy~\hyperref[csl:41]{(Albino et al., 2014)},
it is also advisable to keep the steps that supplement or remove patents
from searches queries to a minimum to maintain data consistency and
repeatability~\hyperref[csl:46]{(Helm et al., 2014)}. As such, the search queries used in
this analysis are based primarily on filtering by International Patent
Classification (IPC) or Cooperative Patent Classification (CPC) labels.
Where possible the IPC categories applied have been reused from previous
studies in order to replicate existing search queries so as to extract
comparative datasets, or have been based on expert defined groupings
such as the European Patent Office's Y02 classification which
specifically relates to climate change mitigation technologies.
Otherwise keyword search terms and IPC labels are combined that focus on
the appearance of closely adjoining instances of the search terms (or of
their common synonyms) to be matched. The use of IPC technology category
filters in this manner ensures that a higher level of relevance and
repeatability is achieved. Based on these preprocessing steps, the final
search queries used for the technologies to be considered are presented
in Table~{\ref{table:search_terms}}.\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{4cm}|p{6cm}|p{7cm}|p{4cm}}
    {Case study} & {Orbit patent search keywords} & {IPC or CPC categories} & {No. of patent families} \\ \midrule
    Compact Fluorescent Lamp & (compact+ or CFL+ or (energ+ s (sav+ or low+))) AND fluores+ & CPC: Y02B-020/16+ OR Y02B-020/18+ OR Y02B-020/19+ & 1,169 (21/07/2017) \\
    \hline
    Electric vehicles & -- & CPC: Y02T-010/62+ OR Y02T-010/64+ OR Y02T-010/70+ OR Y02T-010/72+ OR Y02T-090/1+ & 100,870 (24/07/2017) \\
    \hline
    Fiber optics (data transfer) & ((fiber+ or fibre+) 3d optic+) & IPC: G02B OR H04B OR C03B OR C03C OR D01C OR D04H OR D06L OR G02F OR G06E OR G06K OR G11B OR G11C OR H02G OR H03K OR H04J OR H04N OR G01P & 176,299 (20/07/2017) \\
    \hline
    Geothermal electricity & -- & CPC: Y02E-010/1+ & 5,272 (24/07/2017) \\
    \hline
    Halogen lights & -- & CPC: Y02B-020/12+ & 645 (24/07/2017) \\
    \hline
    Hydro electricity & -- & CPC: Y02E-010/2+ & 46,125 (24/07/2017) \\
    \hline
    Impact/Dot-matrix printers & ((impact+ or (dot+ or matri+) or (daisy 1w wheel+)) 3d print+) & IPC: G03G OR B41J OR G06F OR G06K OR H04N OR G06T OR G02B OR H04L OR G01R OR G03C OR B41M OR G03B OR B65H & 24,993 (24/07/2017) \\
    \hline
    Incandescent lights & Incandescen+ or filament+ & IPC: F21H OR F21L OR F21S OR F21V OR F21W OR F21Y & 17,597 (03/08/2017) \\
    \hline
    Ink jet printer & (ink+ 3d jet+ 3d print+) & IPC: B41J-002/01 OR G03G OR B41J OR G06F OR G06K OR H04N OR G06T OR G02B OR H04L OR G01R OR G03C OR B41M OR G03B OR B65H & 46,135 (24/07/2017) \\
    \hline
    Internet & (internet+ 3d protocol+ 3d suite+) OR (computer+ 1w network+) & IPC: G06F OR H04L OR G06N OR H04K OR G09F & 42,861 (24/07/2017) \\
    \hline
    Landline telephones & (((land\_line+ or main\_line+ or home or fixed\_line+ or wire\_line+) 3d (+phone)) OR (speaking telegraph+) OR (telephon+)) NOT (mobil+ or (cell+ 3d (+phon+ or communi+)) or smart\_phon+ or port+) & IPC: H04B OR H01Q OR H01P OR H04J OR G01R OR H04Q OR H01H OR H04M OR H04R OR G10L & 139,895 (03/08/2017) \\
    \hline
    Laser printer & (laser+ 3d print+) & IPC: G03G OR B41J OR G06F OR G06K OR H04N OR G06T OR G02B OR H04L OR G01R OR G03C OR B41M OR G03B OR B65H & 17,827 (24/07/2017) \\
    \hline
    LED lights & -- & CPC: Y02B-020/3+ & 8,596 (24/07/2017) \\
    \hline
    Linear Fluorescent Tube lights & ((fluores+ 3d (lamp+ or light+ or tube+))) NOT (compact or (energ+ 3d sav+)) & IPC: F21K OR F21L OR F21S OR F21V OR F21W OR F21Y & 25,126 (24/07/2017) \\
    \hline
    Nuclear energy & -- & CPC: Y02E-030+ & 60,017 (24/07/2017) \\
    \hline
    Solar PV & -- & CPC: Y02E-010/5+ OR Y02E-010/6+ & 112,068 (24/07/2017) \\
    \hline
    Solar thermal electricity & -- & CPC: Y02E-010/4+ OR Y02E-010/6+ & 91,553 (24/07/2017) \\
    \hline
    TFT-LCD & ((((thin film+) 1w transistor+) or TFT+) AND (((liquid crystal+) 1w display+) or LCD)) or TFT\_LCD & IPC: G02F-001/13 & 5,181 (24/07/2017) \\
    \hline
    Thermal printers & (thermal+ 2d print+) & IPC: G03G OR B41J OR G06F OR G06K OR H04N OR G06T OR G02B OR H04L OR G01R OR G03C OR B41M OR G03B OR B65H & 23,388 (24/07/2017) \\
    \hline
    Tide-wave-ocean electricity & -- & CPC: Y02E-010/28+ OR Y02E-010/3+ & 19,224 (24/07/2017) \\
    \hline
    Turbojet & ((Gas w turbin+) or (jet+ w engine+) or turbo\_fan+ or turbo\_prop+ or turbo\_jet+ or turbo\_shaft+ or prop\_fan+ or ((open w rotor+) 3d (engine+ or technolog+ or counter\_rotat+))) & IPC: B60K OR B60L OR B60P OR B60V OR B61B OR B61C OR B62D OR B63B OR B63H OR B64C OR B64D OR B64F OR B64G OR F01D OR F02B OR F02C OR F02K & 71,024 (24/07/2017) \\
    \hline
    Wind electricity & -- & CPC: Y02E-010/7+ & 67,035 (24/07/2017) \\
    \hline
    Wireless data transfer & (Wireless 3d data 3d trans+) & IPC: H03K OR H04H OR H04W OR G06K OR G06T & 17,188 (24/07/2017) \\
\end{tabular}%}
\caption{{Patent data search terms}}
\label{table:search_terms}
\end{table*}

\subsection{Patent indicator data extraction
process}

{\label{467781}}

Using the technology classification categories, and where applicable,
the keywords specified in
Table~{\ref{table:search_terms}} the results of these
search queries were exported in batches of up to 10,000 records at a
time in a tabulated HTML format. Exported records were based on only the
representative family member for a given FamPat grouping in order to
avoid duplication of records across multiple jurisdictions.
Additionally, each exported record included the key patent information
along with full details of both cited patent and non-patent literature
references made in the current record. As some searches could generate
very large numbers of records (i.e. hundreds of thousands), the use of
batch processing enabled large quantities of records to be handled in
manageable formats, but required that the batches were subsequently
imported into a tool capable of processing the volumes of data
considered. For this purpose, MATLAB was used, and a script
(\textbf{\emph{provided in Appendix XX}}) was developed to convert each
HTML batch file into a corresponding .MAT file (based on a pre-existing
conversion script), ready for data cleaning processes.

\subsection{Patent indicator data cleaning
process}

{\label{918099}}

Whilst the consistency of the Questel-Orbit patent data is of a high
standard, several steps are still required to be able to extract patent
indicator metrics from this data. This process is not discussed in
detail here\textbf{\emph{,}} \textbf{\emph{but is available in Appendix
XX for more information}}.

\subsection{Technology Life Cycle stage matching
process}

{\label{242296}}

With bibliometric profiles extracted for each of the technologies
considered in this study, the first stage of analysis consists of
identifying the transition points between different stages of the
Technology Life Cycle in order to establish time series segments for use
in subsequent comparative analysis. For the technologies considered in
this study, evidence was identified from literature to suggest when
these transitions had occurred, such as in the innovation timeline
assessments prepared for a range of technologies by
Hanna~\hyperref[csl:47]{(Hanna et al., 2015)}. Full details of the transition points used in
this study are provided in
Table~{\ref{table:TLC_transition_points}}.\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
% \begin{tabular*}{\textwidth}{p{4.5cm}|p{2cm}|p{2cm}|p{2cm}|p{8cm}}
\begin{tabular}{p{4.5cm}|p{2.5cm}|p{2.5cm}|p{2.5cm}|p{8cm}}
    {Case study} & {Last year of Emergence stage} & {Last year of Growth stage} & {Last year of Maturity stage} & {Technology Life Cycle transition point sources} \\ \midrule
    Compact Fluorescent Lamps & 1979 & 2011 & -- & \hyperref[csl:47]{(Hanna et al., 2015}; \hyperref[csl:48]{Weiss et al., 2008)} \\
    \hline
    Electric vehicles & 1997 & 2005 & -- & \hyperref[csl:49]{(Ranaei et al., 2014}; \hyperref[csl:50]{Yuan and Miyazaki, 2014)} \\
    \hline
    Fiber optics (data transfer) & 1970 & 1990 & -- & \hyperref[csl:51]{(Cattani, 2006}; \hyperref[csl:52]{Hecht, 2004)} \\
    \hline
    Geothermal electricity & 1958 & -- & -- & \hyperref[csl:53]{(Glassley, 2014)} \\
    \hline
    Halogen lights & 1959 & -- & -- & \hyperref[csl:54]{(\textit{{{{Light's labour's lost}} author/editor = {IEA}}}, 2006}; \hyperref[csl:55]{Menanteau and Lefebvre, 2000}; \hyperref[csl:56]{Europe and others, 2009)} \\
    \hline
    Hydro electricity & 1956 & 1975 & -- & \hyperref[csl:57]{(Connelly and Sekhar, 2012)} \\
    \hline
    Impact/Dot-matrix printers & 1970 & 1984 & 1991 & \hyperref[csl:58]{(Mayadas et al., 1986}; \hyperref[csl:59]{Tomash, 1990}; \hyperref[csl:60]{Agrawal and Dwoskin, 2003}; \hyperref[csl:61]{Clymer and Asaba, 2008}; \hyperref[csl:62]{Acee, 2001)} \\
    \hline
    Incandescent lights & 1882 & 1916 & 2008 & \hyperref[csl:11]{(Chang and Baek, 2010}; \hyperref[csl:63]{Gendre, 2003}; \hyperref[csl:56]{Europe and others, 2009)} \\
    \hline
    Ink jet printer & 1988 & 1996 & 2003 & \hyperref[csl:61]{(Clymer and Asaba, 2008)} \\
    \hline
    Internet & 1982 & 2000 & -- & \hyperref[csl:64]{(Lemstra, n.d.}; \hyperref[csl:65]{Zakon, 1997}; \hyperref[csl:66]{von Stackelberg, 2011)} \\
    \hline
    Landline telephones & 1878 & 1945 & 2009 & \hyperref[csl:67]{(Ortt and Schoormans, 2004}; \hyperref[csl:68]{ITU, 2013)} \\
    \hline
    Laser printer & 1979 & 1993 & -- & \hyperref[csl:69]{(Grant et al., 2013}; \hyperref[csl:59]{Tomash, 1990)} \\
    \hline
    LED lights & 2001 & -- & -- & \hyperref[csl:47]{(Hanna et al., 2015)} \\
    \hline
    Linear Fluorescent Tube lights & 1937 & 1990 & 2012 & \hyperref[csl:54]{(\textit{{{{Light's labour's lost}} author/editor = {IEA}}}, 2006}; \hyperref[csl:70]{Tidd et al., 1997}; \hyperref[csl:71]{K{\"o}hler, 2013)} \\
    \hline
    Nuclear electricity & 1963 & 1981 & -- & \hyperref[csl:47]{(Hanna et al., 2015)} \\
    \hline
    Solar PV & 1990 & -- & -- & \hyperref[csl:47]{(Hanna et al., 2015)} \\
    \hline
    Solar thermal electricity & 1968 & -- & -- & \hyperref[csl:72]{(EIA, 2008}; \hyperref[csl:73]{Grubler et al., 2012)} \\
    \hline
    TFT-LCD & 1990 & 2007 & -- & \hyperref[csl:36]{(Gao et al., 2013)} \\
    \hline
    Thermal printers & 1972 & 1985 & 2002 & \hyperref[csl:74]{(McLoughlin, n.d.}; \hyperref[csl:75]{Gregory, 1996}; \hyperref[csl:59]{Tomash, 1990}; \hyperref[csl:76]{Scientific, 2007}; \hyperref[csl:77]{Cartridges, 2017)} \\
    \hline
    Tide-wave-ocean electricity & 1966 & -- & -- & \hyperref[csl:78]{(Tester et al., 2012}; \hyperref[csl:79]{Corsatea, 2014)} \\
    \hline
    Turbojet & 1939 & 1958 & -- & \hyperref[csl:80]{(Geels, 2006)} \\
    \hline
    Wind electricity & 1982 & -- & -- & \hyperref[csl:47]{(Hanna et al., 2015)} \\
    \hline
    Wireless data transfer & 1982 & 2002 & -- & \hyperref[csl:47]{(Hanna et al., 2015)} \\
\end{tabular}%}
\caption{{Technology Life Cycle transition points based on literature evidence}}
\label{table:TLC_transition_points}
\end{table*}

Of the 23 technologies listed in
Table~{\ref{table:TLC_transition_points}}, 20 were
found to have patent data available from during the emergence stage
(i.e. excluding incandescent lights, landline telephones, and wireless
data transfer). As such only those technologies with patent data
available during the emergence stage are considered in the analysis that
follows.

For subsequent expansion of this analysis to additional technologies
where evidence is not immediately apparent for the definition of these
segments, a nearest neighbour pattern matching process was also
developed as outlined in section~{\ref{637710}} based
on the work of Gao~\hyperref[csl:36]{(Gao et al., 2013)}. This methodology is not discussed
in further detail in this paper.

\subsection{Identification of significant patent indicator
groups}

{\label{249496}}

Having defined the time periods corresponding to each Technology Life
Cycle stage for the technologies considered, it is now possible to
segment the bibliometric time series into comparable phases of
development. Significant predictors of substitution modes in each
Technology Life Cycle stage are then identified using the procedure
outlined in Fig. {\ref{978289}}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Identification-of-significant-patent-indicator-groups-process/Identification-of-significant-patent-indicator-groups-process}
\caption{{Overview of the process used to identify and rank significant patent
indicator groups
{\label{978289}}%
}}
\end{center}
\end{figure}

As discussed in sections~{\ref{688227}}
and~{\ref{730553}}~an unsupervised learning approach
has been employed here based on applying Dynamic Time Warping and the
`PAM' variant of K-Medoids clustering on the relative distance measures
calculated between time series. This is again implemented as a MATLAB
script based on the DTW and K-Medoid functions made available by
MathsWorks \hyperref[csl:32]{(MathWorks, 2016}; \hyperref[csl:33]{``{Dynamic Time Warping Clustering}'', 2015)}, \textbf{\emph{which is provided in
Appendix XX}}. The first step of this process involves generating a list
of all the unique subsets that can be created from the ten patent
indicator metrics considered in this study. Consequently, this produces
1,023 (i.e.~\(2^{10}-1\)) possible combinations of the ten patent
indicators to be tested, as illustrated by
Fig.~{\ref{564829}}.\selectlanguage{english}
\begin{figure*}[h!]
\begin{center}
\includegraphics[width=400]{figures/Build-list-of-all-possible-patent-indicator-groupings3/Build-list-of-all-possible-patent-indicator-groupings3}
\caption{{Generating list of all possible patent indicator groupings from time
series dimensions considered
{\label{564829}}%
}}
\end{center}
\end{figure*}

Next, the raw patent data time series are transformed by using an
inverse hyperbolic sine function and normalised to convert the data into
a suitable format for long-term comparisons (see discussion in
section~{\ref{606353}}). Once in this format, the data
points are filtered based on the current Technology Life Cycle stage
being considered, as illustrated by
Fig.~{\ref{967185}}, ensuring comparable curve features
are considered.\selectlanguage{english}
\begin{figure*}[h!]
\begin{center}
\includegraphics[width=400]{figures/Transform-the-data-into-suitable-format-for-long-term-comparisons1/Transform-the-data-into-suitable-format-for-long-term-comparisons}
\caption{{Transforming extracted patent data time series into a suitable format
for long-term comparisons
{\label{967185}}%
}}
\end{center}
\end{figure*}

After the datasets have been transformed and filtered based on the
current Technology Life Cycle stage, Dynamic Time Warping is then used
to calculate the Euclidean distance between each pair of technology time
series when compared using the time series dimensions specified by each
patent indicator grouping in turn. This process is depicted visually in
Fig.~{\ref{497938}}, illustrating the successive layers
of filtering that are applied for each technology pairing and each
patent indicator grouping considered. The output from this process is an
\emph{i} x \emph{j} x 1023 distance matrix, where \emph{i} and
\emph{j}~specify the current technology pairing being considered, and
the value quoted is the measured distance between multi-dimensional time
series based on the current patent indicator subset being used. In
parallel to this the corresponding warping paths required to measure the
distance between the~\emph{N}-dimensional curves in each condition are
stored in two separate matrices for later use.\selectlanguage{english}
\begin{figure*}[h!]
\begin{center}
\includegraphics[width=400]{figures/Calculate-distance-between-each-pair-of-technology-time-series-for-each-indicator-grouping/Calculate-distance-between-each-pair-of-technology-time-series-for-each-indicator-grouping}
\caption{{Calculating the~distance between each pair of technology time series for
each indicator grouping
{\label{497938}}%
}}
\end{center}
\end{figure*}

Using this distance matrix it is now possible to apply K-Medoids
clustering to determine the technology groupings predicted when each
specific patent indicator subset is used. By comparing the predicted
technology groupings to those expected from the earlier literature
classifications (see section~{\ref{585124}}), a
confusion matrix is created for each patent indicator subset that shows
the alignment between predicted and target groupings as shown in
Fig.~{\ref{450923}}. Fisher's exact test is then
applied to each confusion matrix to calculate the probability of
obtaining the observed clusters. In doing so, significant patent
indicator subsets are identified based on those that have less than a
5\% chance of natural occurrence.\selectlanguage{english}
\begin{figure*}[h!]
\begin{center}
\includegraphics[width=400]{figures/Identifying-patent-indicator-groups-of-interest/Identifying-patent-indicator-groups-of-interest}
\caption{{Identifying patent indicator groups of interest
{\label{450923}}%
}}
\end{center}
\end{figure*}

\subsection{Ranking of grouped patent indicator
dimensions}

{\label{307966}}

As discussed in
sections~{\ref{606353}},~{\ref{258858}},
and~{\ref{730553}}~leave-p-out cross-validation
techniques provide a means to rank those bibliometric indicator subsets
that have been identified as producing a significant match to the
expected technology groupings. The first stage of this process consists
of generating lists of all possible training technology combinations and
corresponding test technology combinations based on leaving one
technology out at a time. The procedure then progresses in a similar
format to the initial calculation of distances between each pair of
technology time series as shown in Fig.~{\ref{497938}},
except that this time distance measures are only calculated between
pairs of training technologies, and that this process is repeated for
every possible combination of training technologies that are available.
As such, the output from this process is now an~\emph{i} x~\emph{j} x
1023 x~\emph{n} distance matrix, where~\emph{i} and~\emph{j}~now specify
the current~\textbf{training} technology pairing being considered, and
\emph{n} represents the number of training combinations that can be
used. This is illustrated in Fig. {\ref{361537}}.\selectlanguage{english}
\begin{figure*}[h!]
\begin{center}
\includegraphics[width=400]{figures/Calculate-distance-between-each-pair-of-training-technologies-for-each-indicator-grouping/Calculate-distance-between-each-pair-of-training-technologies-for-each-indicator-grouping}
\caption{{Calculating the distance between each pair of training technologies for
each indicator grouping
{\label{361537}}%
}}
\end{center}
\end{figure*}

K-Medoids clustering is once again applied to the resulting training
technology distance matrices, from which two medoid technologies are
identified for each patent indicator subset, in each training condition.
At this point the test technologies can now be evaluated individually
against the two medoid curves identified in each training condition, in
order to determine the closest medoid to the current test technology.
This provides a classification for the test technologies based on each
training condition and each patent indicator subset. From this the
number of test technologies misclassified based on the current training
condition can be determined. This in turn is then used to calculate the
average number of test technologies misclassified for each patent
indicator grouping across all of the training conditions considered.
Finally, the results are sorted in terms of the minimum average number
of misclassifications in order to rank the robustness of each patent
indicator grouping. This procedure is illustrated in Fig.
{\ref{428246}}.\selectlanguage{english}
\begin{figure*}[h!]
\begin{center}
\includegraphics[width=400]{figures/Ranking-of-grouped-patent-indicator-dimensions/Ranking-of-grouped-patent-indicator-dimensions}
\caption{{Ranking of grouped patent indicator dimensions
{\label{428246}}%
}}
\end{center}
\end{figure*}

\subsection{Functional model building
process}

{\label{311620}}

The ranking of different bibliometric indicator subsets provides a means
to identify the time series dimensions that, when combined, are most
likely to provide robust out-of-sample predictions of the observed
technological modes of substitution. As a result, a technology
classification model is now developed using functional data analysis
(see sections~{\ref{419943}}
and~{\ref{805423}}) that is based on indicators 4 and 6
(i.e. the number of non-corporates and the number of cited references by
priority year). Besides being present in all of the highest scoring sets
of top ranked predictors, these particular dimensions can potentially be
associated with the rate of development in technology and science
respectively. This is in the sense that cited references shows a clear
link to scientific production directly influencing technological
development efforts, whilst the number of non-corporates by priority
year (which counts the number of universities, academies, non-profit
labs and technology research centres) is associated with the amount of
lab work required to commercialise a technology. Considering the measure
of non-corporates by priority year specifically, a large volume of lab
work could indicate a lack of technological maturity, or the presence of
considerable complexity in the technology being developed. By contrast,
those technologies with reduced non-corporates by priority year activity
may represent simpler technologies that mature more rapidly or
intuitively. Non-corporates by priority year could therefore equate to a
measure of technological complexity, or effort required to mature.

However, it's also worth noting that there are other indicator subset
couples/triples that perform nearly as well. It is possible that these
other high-performing subsets may be in some way related to the chosen
indicators (i.e. perfect orthogonality can not necessarily be assumed
between these metrics), and so at this point the choice has been taken
to use the indicators specified as these have been seen to be the most
statistically robust, whilst also being in good agreement with previous
literature conclusions.

Following on from the initial introduction to functional data analysis
provided in section~{\ref{419943}}, and more detailed
methods presented in~\hyperref[csl:21]{(Ramsay et al., 2009)}, the method outlined in
Fig.~{\ref{529107}} has been implemented in MATLAB for
building a functional linear regression model for the purposes of
technology classification (\textbf{\emph{the MATLAB script is available
in Appendix XX for further details}}).\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.84\columnwidth]{figures/Functional-model-building-process/Functional-model-building-process}
\caption{{Functional model building process based on methods outlined
in~\protect\hyperref[csl:21]{(Ramsay et al., 2009)}
{\label{529107}}%
}}
\end{center}
\end{figure}

Taking the chosen time series dimensions as a starting point, a
functional data object must first be created for each of the patent
indicators (or model components) included in the chosen subset. However,
as the Technology Life Cycle stages being considered will have a
different number of observations for each case study technology, it is
first necessary to resample the segmented time series based on a common
number of resampling points. This ensures that even if one Technology
Life Cycle stage spans 20 years in one time series, and spans 50 years
in another, both time series will have 50 observations, which enables
the two curves to be aligned relative to each other for the current
Technology Life Cycle stage. Next a B-spline basis system is created for
each model component based on the common number of resampling points
defined, and at the same time for the beta coefficients
(\(\beta_i\)) to be estimated by the functional linear regression
analysis (see Eq.~{\ref{eq:basis_function_1}} and
Eq.~{\ref{eq:linear_regression}} in
section~{\ref{419943}}, as well as sections 3.4.1,
3.4.2, 9.4.1 and 9.4.2 of
\href{https://www.authorea.com/users/161287/articles/182044-identifying-the-mode-and-impact-of-disruptive-innovations-journal-paper\#Ramsay_2009}{(Ramsay
2009)}), as illustrated in Fig.~{\ref{416597}}.\selectlanguage{english}
\begin{figure*}[h!]
\begin{center}
\includegraphics[width=400]{figures/Building-functional-models-of-selected-patent-indicator-groupings/Building-functional-models-of-selected-patent-indicator-groupings}
\caption{{Building functional models of selected patent indicator groupings
{\label{416597}}%
}}
\end{center}
\end{figure*}

Before functional data objects can be generated from the B-spline basis
systems the degree of curve smoothing to be applied has to be
determined. Following the process outlined in~\hyperref[csl:21]{(Ramsay et al., 2009)} a
`functional parameter object' that allows smoothness to be imposed on
estimated functional parameters is now created (see section 5.2.4 of
\hyperref[csl:21]{(Ramsay et al., 2009)}). A functional data object is then created for the
current model component using the new functional parameter object, along
with an initial value of the smoothing parameter (\(\lambda\)).
The degrees of freedom and generalised cross-validation criterion
coefficient (see section 5.3 of~\hyperref[csl:21]{(Ramsay et al., 2009)}) can then be
calculated for the current functional data object. By repeating this
process for a range of~\(\lambda\) values and plotting the
results (not shown here) a suitable smoothing parameter can be
identified that will be used in the final functional data object for
each model component. An example of a smoothed functional data object
generated for the number of corporations associated with different
technologies in a given priority year is illustrated in Fig.
{\ref{135605}}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Functional-Data-Object-for-all-technology-profiles-based-on-corporates-by-priority-year-(edited)/Functional-Data-Object-for-all-technology-profiles-based-on-corporates-by-priority-year-(edited)}
\caption{{Functional Data Object for all technology profiles based on corporates
by priority year
{\label{135605}}%
}}
\end{center}
\end{figure}

Having created a functional data object representation of each model
component from the selected bibliometric subset, the MATLAB script then
assesses the fit of each functional data object to the trend data. This
is accomplished by calculating the residuals, variance, and standard
deviations between the real and modelled values across the different
technology curves included, but also across the time span of the
Technology Life Cycle stage considered (see section 5.5
of~\hyperref[csl:21]{(Ramsay et al., 2009)}). A related sanity check for the functional data
objects generated for each model component (before they are used in the
functional linear regression analysis) is the plotting of functional
descriptive statistics (see section 6.1.1 of~\hyperref[csl:21]{(Ramsay et al., 2009)}). The
functional mean and standard deviation of the functional data objects
for the number of non-corporates and the number of cited references by
priority year are shown in Fig.~{\ref{413726}} and
Fig.~{\ref{437199}} respectively, and~show that for
both model components variability increases as time progresses (as would
be expected with most forecasts). In addition the mean functional data
object values show that there is a notable early surge in non-corporates
by priority year during the emergence phase before a technology achieves
mainstream adoption. This corresponds well to the hype cycle associated
with new technologies during early development when significant levels
of R\&D are first launched in a race to achieve commercialisation, which
can often prove~premature or short-lived. By contrast, the mean cited
references by priority year measure shows that a steadily accelerating
growth is observed during the emergence phase, without significant
undulation, potentially implying that scientific development efforts are
less phased by disturbances as they begin to accumulate.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Mean-functional-data-object-values-for-chosen-patent-indicator-subset-(edited)/Mean-functional-data-object-values-for-chosen-patent-indicator-subset---emergence}
\caption{{Mean functional data object values for chosen patent indicator subset
{\label{413726}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Standard-deviation-of-functional-data-objects-created-for-chosen-patent-indicator-subset-(edited)/Standard-deviation-of-functional-data-objects-created-for-chosen-patent-indicator-subset---emergence}
\caption{{Standard deviation of functional data objects created for chosen patent
indicator subset
{\label{437199}}%
}}
\end{center}
\end{figure}

\subsubsection{Identification of smoothing parameter values for
regression
coefficients}

{\label{945131}}

With the functional data objects for each model component now ready, a
cell array containing each model component along with a constant
predictor term is generated for use in the functional liner regression.
Before the final regression analysis can be run, a smoothing parameter
for the regression coefficient beta basis system has to be selected.
This is achieved by calculating leave-one-out cross-validation scores
(i.e. error sum of squares values) for functional responses using a
range of different smoothing parameter values, as per section 9.4.3 and
10.6.2 of~\hyperref[csl:21]{(Ramsay et al., 2009)}. The functional parameter object used in
the beta basis system is then redefined based on the refined smoothing
parameter identified in order to ensure that the functional linear
regression analysis converges on a model that has the best chances of
performing well out-of-sample.

\section{Results and Discussion}

{\label{559459}}

The functional linear regression analysis is now run with the identified
smoothing parameters and scalar response variables to identify
the~\(\beta_i\) coefficients and the corresponding variance, used
to define the 95\% confidence bounds (see sections 9.4.3 and 9.4.4
of~\hyperref[csl:21]{(Ramsay et al., 2009)} respectively).
Fig.~{\ref{820059}} to
Fig.~{\ref{942889}} show the
resulting~\(\beta_i\) coefficients and confidence bounds for the
number of non-corporates and the~number of cited references by priority
year, when considering the emergence phase of development and using a
high-dimensional regression fit (i.e. when the beta basis system for
each regression coefficient is made of a large number of B-splines).
This regression fit successfully identifies the correct mode of
substitution from patent data available in the emergence stage for 19 of
the 20 technologies considered.

From the confidence bounds on these plots it can be seen that for both
the number of non-corporates and the number of cited references by
priority year the variance is highest at the start of the emergence
phase: this is often when the least amount of data is available for
comparing each technology, so this is not entirely surprising as this
represents the point of greatest uncertainty. However,
Fig.~{\ref{822351}}~and
Fig.~{\ref{942889}} also illustrate how the influence
these two patent dimensions have on the predicted mode of substitution
varies with time during the emergence phase. More specifically,
deviations away from zero in these coefficient functions equate to an
increased positive or negative weighting for the associated patent
indicator count at that moment in time, within the determination of the
predicted mode of substitution. As such it can be seen that any patent
indicator counts at~\emph{t = 0}~for the number of non-corporates by
priority year (assuming these are present) will have a more significant
influence on the final mode of substitution predicted. Equally, these
particular regression results would suggest that the impact of
non-corporates activity next peaks around 40\% of the way through the
emergence phase (potentially corresponding to the hype effect suggested
previously), and again at the end of the emergence phase. For the number
of cited references by priority year, this regression model suggests
that the times of greatest impact on the mode of substitution are at the
very beginning and at the very end of the emergence stage
respectively.~Whilst these coefficient plots gives some indication of
the relative weighting applied to patent indicator counts as time
progresses, the cumulative nature of the inner products used in
Eq.~{\ref{eq:linear_regression}} makes it difficult to
visually infer from these plots alone which mode the technology under
evaluation is currently converging towards.~For this it is also
necessary to include the corresponding patent indicator count values
that these coefficient terms are multiplied by for the specific
technology being assessed.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Estimated-regression-coefficient-for-the-constant-functional-basis-system---emergence/Estimated-regression-coefficient-for-the-constant-functional-basis-system---emergence}
\caption{{Estimated regression coefficient for the constant functional basis
system - emergence
{\label{820059}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Estimated-regression-coefficient-for-predicting-technology-cluster-from-cited-patents-by-priority-year---emergence/Estimated-regression-coefficient-for-predicting-technology-cluster-from-non-corporates-by-priority-year---emergence}
\caption{{Estimated regression coefficient for predicting technology cluster from
non-corporates by priority year - emergence
{\label{822351}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Estimated-regression-coefficient-for-predicting-technology-cluster-from-cited-references-by-priority-year---emergence/Estimated-regression-coefficient-for-predicting-technology-cluster-from-cited-references-by-priority-year---emergence}
\caption{{Estimated regression coefficient for predicting technology cluster from
cited references by priority year - emergence
{\label{942889}}%
}}
\end{center}
\end{figure}

Whilst the regression coefficient plots help to provide a possible
interpretation of the relationship between the different model
components and the predicted technology substitution classifications, it
is also necessary to check the `goodness-of-fit' measures associated
with these results. As such, R-Squared, adjusted R-Squared, and F-ratio
statistics are calculated (see section 9.4.1 and 9.4.2
of~\hyperref[csl:21]{(Ramsay et al., 2009)}) to assess~the overall fit of the high-dimensional
functional linear regression model, and are summarised in
Table~{\ref{table:results_high_dimensional_model}}.\selectlanguage{english}
\begin{table*}
\centering
\begin{tabular}{p{1.5cm}p{1.5cm}p{1.5cm}p{1.7cm}p{1.7cm}p{1.5cm}}
    {Correct mode type} & {R-squared} & {Adjusted R-squared} & {Degrees of freedom 1} & {Degrees of freedom 2} & {F-ratio} \\ \midrule
    19/20 & 0.7954 & 0.7713 & 7.7837 & 11.2163 & 5.6024 \\
\end{tabular}
\caption{{Results of high dimensional model fit}}
\label{table:results_high_dimensional_model}
\end{table*}

The R-squared and adjusted R-squared values shown in
Table~{\ref{table:results_high_dimensional_model}}
would suggest~that a reasonable fit has been achieved with this
model,~with a good level of accuracy, whilst the F-ratio of 5.60 with
degrees of freedom 7.78 and 11.22 respectively implies that the
relationship established has a p-value somewhere between 0.0041 and
0.0060. As such this result appears to be~significant at the 1\% level.

However, to ensure that this is the most appropriate fit to the data
presented, the high-dimensional model initially developed was
subsequently benchmarked against a low-dimensional model (i.e. when the
beta basis system for each regression coefficient is made of a small
number of B-splines), as well as a constant and a monomial based model.
The corresponding `goodness-of-fit' measures for the alternative
functional linear regression models are compiled in
Table~{\ref{table:results_benchmarking}}.\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{2.5cm}p{1.5cm}p{1.5cm}p{1.5cm}p{1.7cm}p{1.7cm}p{1.5cm}p{1.5cm}}
    {Model basis} & {Correct mode type} & {R-squared} & {Adjusted R-squared} & {Degrees of freedom 1} & {Degrees of freedom 2} & {F-ratio} & {p-value} \\ \midrule
    Low dimension & 19/20 & 0.8514 & 0.8340 & 10 & 9 & 5.1584 & 0.0107 \\
    Constant & 18/20 & 0.6200 & 0.5753 & 2 & 17 & 13.8684 & 0.0003 \\
    Monomial & 19/20 & 0.8139 & 0.7920 & 8 & 11 & 6.0139 & 0.0040 \\
\end{tabular}%}
\caption{{Benchmarking results}}
\label{table:results_benchmarking}
\end{table*}

Whilst the R-squared and adjusted R-squared~measures observed in
Table~{\ref{table:results_benchmarking}}~would suggest
that the low-dimensional model provides a better fit, the associated
F-ratio score and corresponding p-value suggests a lower significance
than those values observed for the high-dimensional model. Conversely,
the constant basis model does not appear to provide as good a fit to the
expected scalar responses from the R-squared and adjusted R-squared
values, but this is not surprising considering the more limited nature
of models built on constant terms. Finally, the monomial basis system
performs fractionally better on both the R-squared and adjusted
R-squared measures whilst also achieving a comparable level of
significance to the high-dimensional~model. Consequently, from this
benchmarking analysis it would appear that the high-dimensional and
monomial basis system models are the most suitable candidates, but it is
possible that the overall performance of the high-dimensional model
could be further improved by sensitivity studies into the optimum number
of B-splines to use in the regression fit.

To further validate the statistical significance of the four models
considered here permutation testing is applied to count the proportion
of generated F values that are larger than the F-statistic for each
model (see section 9.5 of~\hyperref[csl:21]{(Ramsay et al., 2009)}). This involves repeatedly
shuffling the expected mode classification labels versus the technology
profiles being read (maintaining their original order) to see if it is
still possible to fit the regression model to these reordered responses.
In so doing, this test also creates a null distribution versus
the~\emph{q}\textsuperscript{th} quantile and observed F-statistic
generated from the models themselves. The results of this analysis are
shown in Fig. 19.

\graphicspath{{figures/}}

%\begin{figure}[ht!]\selectlanguage{english}
\begin{figure*}[htbp!]
    \begin{center}
%
        \subfloat[High-dimensional basis system \label{fig:first}]{%
            \includegraphics[width=0.5\textwidth]{Permutation F-Test and null distribution for the high-dimensional functional regression model - emergence/Permutation F-Test and null distribution for the high-dimensional functional regression model - emergence.png}
        }%
        \subfloat[Low-dimensional basis system \label{fig:second}]{%
           \includegraphics[width=0.5\textwidth]{Permutation F-Test and null distribution for the low-dimensional functional regression model - emergence/Permutation F-Test and null distribution for the low-dimensional functional regression model - emergence.png}
        }\\ %  ------- End of the first row ----------------------%
        \subfloat[Constant basis system \label{fig:third}]{%
            \includegraphics[width=0.5\textwidth]{Permutation F-Test and null distribution for the constant basis system functional regression model - emergence/Permutation F-Test and null distribution for the constant basis system functional regression model - emergence.png}
        }%
        \subfloat[Monomial basis system \label{fig:fourth}]{%
            \includegraphics[width=0.5\textwidth]{Permutation F-Test and null distribution for the monomial basis system functional regression model - emergence/Permutation F-Test and null distribution for the monomial basis system functional regression model - emergence.png}
        }%
%
    \end{center}
    \caption{{%
        Permutation F-Test and null distributions for functional regression models - emergence \label{fig:permutation_&_null_distributions}
    }}
\end{figure*}

For statistical significance it is necessary that the observed test
statistic is found in the tail of the distribution generated. As such,
in this stage of the analysis the high and low-dimensional models
perform best as the observed F-statistics are furthest along each
distribution's right tail in relative terms in comparison to the other
distributions generated for the constant and monomial based models.
These distributions also suggest that a similar level of statistical
significance is observed between the high and low-dimensional models,
although as this permutation testing was only based on 1,000
permutations, the distributions could still evolve further with a
greater number of permutations. However, the constant basis system model
is more clearly seen here not to perform as well, with the observed
F-statistic closest to the main body of the distribution. This, in
combination with the other `goodness-of-fit' measures, would therefore
suggest that the high-dimensional functional linear regression model
provides the best basis for a technology substitution classification
model from those tested in this analysis.

\section{Conclusions from statistical ranking and functional data
analysis}\label{conclusions-from-statistical-ranking-and-functional-data-analysis}

Expanding on previous historical accounts of technological substitutions
this study has examined the premise that two principal modes are often
observed when considering transitions between successive commercially
prevalent technologies: reactive and presumptive technological
substitutions. These two modes are believed to correspond to
significantly different technology adoption characteristics (not
discussed in this paper), with scientific foresight believed to play a
crucial role in the identification of presumptive innovations, and
performance stagnation leading to reactive transitions. In both cases,
technological anomalies are believed to arise, either as a result of
scientific or technological crisis, that subsequently trigger the
eventual shift to the next technological paradigm. As such, this paper
has considered 23 example technologies where literature evidence of
performance development trends has been found in order to test the
ability to correctly identify associated adoption modes using
bibliometric, pattern recognition, and statistical analysis techniques.
The results obtained from this analysis suggest that statistical
analysis of patent indicator time series,~segmented based on identified
Technology Life Cycle features, provides a possible means for
classification of technological substitutions. Specifically, for the
datasets considered~measures of the number of cited references and the
involvement of non-corporate entities by year during the emergence phase
were found to provide a good indication of the expected mode of
substitution when used as a basis for functional linear regression
(correctly classifying 19 out of 20 technologies included in this
stage), and performed consistently well in statistical ranking of
predictive capability. These selected patent data dimensions can be
associated with perceptions of scientific and technological production
respectively, consistent with the basic prerequisites listed in
section~{\ref{585124}} for a classification scheme that
can identify presumptive technological substitutions. Whilst these two
patent dimensions occur in all of the most robust predictor subsets
(i.e. in terms of out-of-sample reliability) when basing analysis on the
emergence stage, this does not prove that these are the only indicators
capable of predicting modes of technological substitution. As discussed
in section~{\ref{311620}}, the possibility of
orthogonality has not been ruled out with regards to the other patent
indicators shown in
Table~{\ref{table:bibliometric_indicators}}. However,
these two dimensions are in good agreement with the technological
anomaly arguments put forward by Constant in
sections~{\ref{646617}}
and~{\ref{585124}}, and so were felt to be reasonable
for forming the basis of the technology classification model that has
been developed using functional linear regression. In particular, a
regression fit made up of beta coefficient functions with many B-spline
elements was found to provide a viable means of correctly matching the
mode of substitution to the technology profile being evaluated when
considering multiple `goodness of fit' measures. Permutation testing of
the derived technology classification model further suggests that the
regression fit is sensitive to the ordering of the expected mode labels
relative to the technology time series being considered, so this
relationship would appear to be based on the specifics of the individual
technology curves considered, and does not appear to be occurring by
chance. This implies that it may be possible to predict modes of
substitution from limited bibliometric data during the earliest stages
of technology development, providing some evaluation of the progress
through the early stages of Technology Life Cycle is made (this can be
obtained using a nearest neighbour matching process, not discussed in
this paper). Equally this shows that the functional data approach
employed corroborates well the earlier statistical rankings produced
using Dynamic Time Warping, K-Medoids clustering, and leave-one-out
cross-validation of the selected patent indicators, suggesting that
these two methods are compatible for this type of analysis.

It is also important to remember the potential limitations of this study
that would need to be addressed for further confidence in the
methodology used. Firstly, only a relatively small number of
technologies have been evaluated in this study due to the time-consuming
process required for data extraction, preparation, and identification of
supporting evidence from literature for the assignment of expected
classification labels. Consequently, whilst precautions have been taken
to minimise the risk of model over-fitting, the cross-validation
procedures employed would benefit from further verification with a more
diverse spread of technologies to ensure that out-of-sample errors are
accurately captured here. Regression models based on small sample sizes
can be very fickle to the datasets they are calibrated to, so it cannot
be ruled out that the results presented here are a better fit to the
industries included in this analysis, rather than a model that can be
necessarily generalised to all technologies. However, perhaps the most
important note of caution regarding this work relates to the
quantitative approaches used here. Whilst statistical approaches are
well-suited to detecting underlying correlations in historical and
experimental datasets, this on it's own does not provide a detailed
understanding of the causation behind associated events, particularly in
this case when considering the breadth of reasons for technological
stagnations, `failures', or presumptive leaps to occur. Equally,
statistical methods are not generally well suited to predicting
disruptive events and complex interactions, with other simulation
techniques such as System Dynamics and Agent Based Modelling performing
better in these areas. Accordingly, to identify causation effects and
test the sensitivity of technological substitution patterns to
variability arising from real-world socio-technical behaviours not
captured in simple bibliometric indicators (such as the influence of
competition, organisational, and economic effects), the fitted
regression model presented here also needs to be evaluated in a causal
environment. Similarly, in order to demonstrate practical applicability
the mode of substitutions considered here need to be related to observed
adoption characteristics (not discussed in this paper). Consequently, a
System Dynamics model built on the regression functions identified in
this study is proposed (although not discussed here) in order to
calibrate these extracted technology profiles and mode predictions to
empirical adoption data. This aims to more thoroughly explore the causal
mechanisms relating early indicators of technological substitution to
the eventual adoption patterns observed and provide a means of applying
greater reasoning to the relationships identified here.~

\section{Acknowledgements}

{\label{687807}}

TBD

\selectlanguage{english}
\FloatBarrier
\section*{References}\sloppy
\phantomsection
\label{csl:22} n.d.

\phantomsection
\label{csl:23} 2013.

\phantomsection
\label{csl:24} 2015.

\phantomsection
\label{csl:28} 2014.

\phantomsection
\label{csl:33} 2015.

\phantomsection
\label{csl:34} 2012.

\phantomsection
\label{csl:38} 2003. . Microelectronics International 20. \url{https://doi.org/10.1108/mi.2003.21820cab.008}

\phantomsection
\label{csl:54} 2006. . {OECD} Publishing. \url{https://doi.org/10.1787/9789264109520-en}

\phantomsection
\label{csl:62}Acee, H., 2001. {Disruptive technologies: an expanded view} (PhD thesis). Massachusetts Institute of Technology.

\phantomsection
\label{csl:60}Agrawal, A., Dwoskin, J., 2003. {Timeline of Printers}.

\phantomsection
\label{csl:41}Albino, V., Ardito, L., Dangelico, R.M., Petruzzelli, A.M., 2014. {Understanding the development trends of low-carbon energy technologies: A patent analysis}. Applied Energy 135, 836–854. \url{https://doi.org/10.1016/j.apenergy.2014.08.012}

\phantomsection
\label{csl:30}Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E., 2016. {The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances}. Data Mining and Knowledge Discovery 31, 606–660. \url{https://doi.org/10.1007/s10618-016-0483-9}

\phantomsection
\label{csl:3}Bass, F.M., 2004. {Comments on {\textquotedblleft}A New Product Growth for Model Consumer Durables The Bass Model{\textquotedblright}}. Management Science 50, 1833–1840. \url{https://doi.org/10.1287/mnsc.1040.0300}

\phantomsection
\label{csl:77}Cartridges, R.B., 2017. {The Evolution Of Printing}.

\phantomsection
\label{csl:51}Cattani, G., 2006. {Technological pre-adaptation speciation, and emergence of new technologies: how Corning invented and developed fiber optics}. Industrial and Corporate Change 15, 285–318. \url{https://doi.org/10.1093/icc/dtj016}

\phantomsection
\label{csl:11}Chang, Y.S., Baek, S.J., 2010. {Limit to improvement: Myth or reality?}. Technological Forecasting and Social Change 77, 712–729. \url{https://doi.org/10.1016/j.techfore.2010.02.010}

\phantomsection
\label{csl:13}Christensen, C.M., Rosenbloom, R.S., 1995. {Explaining the attacker{\textquotesingle}s advantage: Technological paradigms organizational dynamics, and the value network}. Research Policy 24, 233–257. \url{https://doi.org/10.1016/0048-7333(93)00764-k}

\phantomsection
\label{csl:61}Clymer, N., Asaba, S., 2008. {A new approach for understanding dominant design: The case of the ink-jet printer}. Journal of Engineering and Technology Management 25, 137–156. \url{https://doi.org/10.1016/j.jengtecman.2008.06.003}

\phantomsection
\label{csl:57}Connelly, M.C., Sekhar, J.A., 2012. {U. S. energy production activity and innovation}. Technological Forecasting and Social Change 79, 30–46. \url{https://doi.org/10.1016/j.techfore.2011.05.001}

\phantomsection
\label{csl:79}Corsatea, T.D., 2014. {Increasing synergies between institutions and technology developers: Lessons from marine energy}. Energy Policy 74, 682–696. \url{https://doi.org/10.1016/j.enpol.2014.07.006}

\phantomsection
\label{csl:4}Datt{\'{e}}e, B., Weil, H.B., 2007. {Dynamics of social factors in technological substitutions}. Technological Forecasting and Social Change 74, 579–607. \url{https://doi.org/10.1016/j.techfore.2007.03.003}

\phantomsection
\label{csl:44}Dong, B., Xu, G., Luo, X., Cai, Y., Gao, W., 2012. {A bibliometric analysis of solar power research from 1991 to 2010}. Scientometrics 93, 1101–1117. \url{https://doi.org/10.1007/s11192-012-0730-9}

\phantomsection
\label{csl:72}EIA, 2008. {Energy timelines - Solar Thermal}.

\phantomsection
\label{csl:56}Europe, N.C., others, 2009. {Life cycle assessment of ultra-efficient lamps}, in: Department of Environment. FARA UK.

\phantomsection
\label{csl:5}Foster, R.N., 1986. {Innovation: The Attacker's Advantage}, McKinsey and Company, New York.

\phantomsection
\label{csl:14}Foster, R.N., 1985. {Timing technological transitions}. Technology in Society 7, 127–141. \url{https://doi.org/10.1016/0160-791x(85)90022-3}

\phantomsection
\label{csl:36}Gao, L., Porter, A.L., Wang, J., Fang, S., Zhang, X., Ma, T., Wang, W., Huang, L., 2013. {Technology life cycle analysis method based on patent documents}. Technological Forecasting and Social Change 80, 398–407. \url{https://doi.org/10.1016/j.techfore.2012.10.003}

\phantomsection
\label{csl:80}Geels, F.W., 2006. {Co-evolutionary and multi-level dynamics in transitions: The transformation of aviation systems and the shift from propeller to turbojet (1930{\textendash}1970)}. Technovation 26, 999–1016. \url{https://doi.org/10.1016/j.technovation.2005.08.010}

\phantomsection
\label{csl:63}Gendre, M.F., 2003. {Two centuries of electric light source innovations}. URL: \url{http://ufowaves.org/rendlesham/Rendlesham\%20Incident\%20Forum/www.einlightred.tue.nl/lightsources/history/light_history.pdf} 143.

\phantomsection
\label{csl:53}Glassley, W.E., 2014. {Geothermal energy: renewable energy and the environment}. CRC Press.

\phantomsection
\label{csl:7}Gooday, G., 1998. {Re-writing the `book of blots': Critical reflections on histories of technological `failure'}. History and Technology 14, 265–291. \url{https://doi.org/10.1080/07341519808581934}

\phantomsection
\label{csl:69}Grant, A.E., Meadows, J.H., others, 2013. {Communication technology update and fundamentals}. Taylor \& Francis.

\phantomsection
\label{csl:75}Gregory, P., 1996. {Chemistry and technology of printing and imaging systems}. Springer.

\phantomsection
\label{csl:73}Grubler, A., Aguayo, F., Gallagher, K.S., Hekkert, M., Jiang, K., Mytelka, L., Neij, L., Nemet, G.F., Wilson, C., 2012. {Policies for the energy technology innovation system (ETIS)}.

\phantomsection
\label{csl:2}Hall, J., Rosson, P., 2006. {The Impact of Technological Turbulence on Entrepreneurial Behavior Social Norms and Ethics: Three Internet-based Cases}. Journal of Business Ethics 64, 231–248. \url{https://doi.org/10.1007/s10551-005-5354-z}

\phantomsection
\label{csl:47}Hanna, R., Gross, R., Speirs, J., Heptonstall, P., Gambhir, A., 2015. {Innovation timelines from invention to maturity}. UK Energy Research Centre.

\phantomsection
\label{csl:52}Hecht, J., 2004. {City of light: the story of fiber optics}. Oxford University Press on Demand.

\phantomsection
\label{csl:46}Helm, S., Tannock, Q., Iliev, I., 2014. {Renewable energy technology: Evolution and policy implications--evidence from patent literature (Global Challenges Report). Geneva: WIPO}. World Intellectual Property Organization, online at: \url{http://www.} wipo. int/export/sites/www/policy/en/climate\_change/pdf/ccmt\_report. pdf.

\phantomsection
\label{csl:27}Hyndman, R.J., 2010. {Transforming data with zeros}.

\phantomsection
\label{csl:12}II, E.W.C., 1973. {A Model for Technological Change Applied to the Turbojet Revolution}. Technology and Culture 14, 553. \url{https://doi.org/10.2307/3102443}

\phantomsection
\label{csl:68}ITU, I.T.U., 2013. {Measuring the information society}.

\phantomsection
\label{csl:71}K{\"o}hler, A.R., 2013. {Anticipatory eco-design strategies for smart textiles}. Perspectives on environmental risk prevention in the development of an emerging technology.

\phantomsection
\label{csl:37}Lambert, N., 2000. {Orbit and Questel-Orbit: Farewell and Hail}.

\phantomsection
\label{csl:64}Lemstra, W., n.d. {The Internet Bubble}.

\phantomsection
\label{csl:19}Lin, J., Williamson, S., Borne, K., DeBarr, D., 2012. {Pattern recognition in time series}. Advances in Machine Learning and Data Mining for Astronomy 1, 617–645.

\phantomsection
\label{csl:18}Little, A.D., 1981. {The strategic management of technology}. Arthur D. Little.

\phantomsection
\label{csl:1}Lucas, H.C., Goh, J.M., 2009. {Disruptive technology: How Kodak missed the digital photography revolution}. The Journal of Strategic Information Systems 18, 46–55. \url{https://doi.org/10.1016/j.jsis.2009.01.002}

\phantomsection
\label{csl:20}Lucero, J.C., Koenig, L.L., 2000. {Time normalization of voice signals using functional data analysis}. The Journal of the Acoustical Society of America 108, 1408–1420. \url{https://doi.org/10.1121/1.1289206}

\phantomsection
\label{csl:43}Mao, G., Liu, X., Du, H., Zuo, J., Wang, L., 2015. {Way forward for alternative energy research: A bibliometric analysis during 1994{\textendash}2013}. Renewable and Sustainable Energy Reviews 48, 276–286. \url{https://doi.org/10.1016/j.rser.2015.03.094}

\phantomsection
\label{csl:15}Martin, B.R., 1996. {The use of multiple indicators in the assessment of basic research}. Scientometrics 36, 343–362. \url{https://doi.org/10.1007/bf02129599}

\phantomsection
\label{csl:31}MathWorks, 2016. {Distance between signals using dynamic time warping}.

\phantomsection
\label{csl:32}MathWorks, 2016. {k-medoids clustering}.

\phantomsection
\label{csl:58}Mayadas, A.F., Durbeck, R.C., Hinsberg, W.D., McCrossin, J.M., 1986. {The evolution of printers and displays}. {IBM} Systems Journal 25, 399–416. \url{https://doi.org/10.1147/sj.253.0399}

\phantomsection
\label{csl:74}McLoughlin, I., n.d. {Computer Peripherals}.

\phantomsection
\label{csl:55}Menanteau, P., Lefebvre, H., 2000. {Competing technologies and the diffusion of innovations: the emergence of energy-efficient lamps in the residential sector}. Research Policy 29, 375–389. \url{https://doi.org/10.1016/s0048-7333(99)00038-4}

\phantomsection
\label{csl:16}Narin, F., Hamilton, K.S., 1996. {Bibliometric performance measures}. Scientometrics 36, 293–310. \url{https://doi.org/10.1007/bf02129596}

\phantomsection
\label{csl:26}Nau, R., n.d. {The logarithm transformation}.

\phantomsection
\label{csl:67}Ortt, J.R., Schoormans, J.P.L., 2004. {The pattern of development and diffusion of breakthrough communication technologies}. European Journal of Innovation Management 7, 292–302. \url{https://doi.org/10.1108/14601060410565047}

\phantomsection
\label{csl:8}Pye, D., 1978. {Nature and aesthetics of design}. 1978.

\phantomsection
\label{csl:25}Ramsay, J., 2013. {Dissecting the U.S. Nondurable Goods Index}.

\phantomsection
\label{csl:35}Ramsay, J., 2013. {Smoothing the Nondurable Goods Index}.

\phantomsection
\label{csl:21}Ramsay, J., Hooker, G., Graves, S., 2009. {Functional Data Analysis with R and {MATLAB}}. Springer New York. \url{https://doi.org/10.1007/978-0-387-98185-7}

\phantomsection
\label{csl:49}Ranaei, S., Karvonen, M., Suominen, A., K{\"a}ssi, T., 2014. {Forecasting emerging technologies of low emission vehicle}, in: Management of Engineering \& Technology (PICMET), 2014 Portland International Conference On. IEEE, pp. 2924–2937.

\phantomsection
\label{csl:42}Rizzi, F., van Eck, N.J., Frey, M., 2014. {The production of scientific knowledge on renewable energies: Worldwide trends dynamics and challenges and implications for management}. Renewable Energy 62, 657–671. \url{https://doi.org/10.1016/j.renene.2013.08.030}

\phantomsection
\label{csl:9}Schilling, M.A., Esmundo, M., 2009. {Technology S-curves in renewable energy alternatives: Analysis and implications for industry and government}. Energy Policy 37, 1767–1781. \url{https://doi.org/10.1016/j.enpol.2009.01.004}

\phantomsection
\label{csl:40}Schmoch, U., 1997. {Indicators and the relations between science and technology}. Scientometrics 38, 103–116. \url{https://doi.org/10.1007/bf02461126}

\phantomsection
\label{csl:76}Scientific, G., 2007. {UNITED STATES SECURITIES AND EXCHANGE COMMISSION - FORM 10-K}.

\phantomsection
\label{csl:10}Sood, A., Tellis, G.J., 2005. {Technological Evolution and Radical Innovation}. Journal of Marketing 69, 152–168. \url{https://doi.org/10.1509/jmkg.69.3.152.66361}

\phantomsection
\label{csl:78}Tester, J.W., Drake, E.M., Driscoll, M.J., Golay, M.W., Peters, W.A., 2012. {Sustainable energy: choosing among options}. MIT press.

\phantomsection
\label{csl:70}Tidd, J., Bessant, J.R., Pavitt, K., 1997. {Managing innovation: integrating technological, market and organizational change}. Wiley Chichester.

\phantomsection
\label{csl:59}Tomash, E., 1990. {The U.S. Computer Printer Industry}.

\phantomsection
\label{csl:29}Twomey, B., n.d. {Simple Vs. Exponential Moving Averages}.

\phantomsection
\label{csl:6}Utterback, J.M., 1994. {Mastering the dynamics of innovation: how companies can seize opportunities in the face of technological change Harvard Business School Press}. Boston, MA.

\phantomsection
\label{csl:17}Verbeek, A., Debackere, K., Luwel, M., Zimmermann, E., 2002. {Measuring progress and evolution in science and technology--I: The multiple uses of bibliometric indicators}. international Journal of management reviews 4, 179–211.

\phantomsection
\label{csl:39}Verbeek, A., Debackere, K., Luwel, M., Zimmermann, E., 2002. {Measuring progress and evolution in science and technology - I: The multiple uses of bibliometric indicators}. International Journal of Management Reviews 4, 179–211. \url{https://doi.org/10.1111/1468-2370.00083}

\phantomsection
\label{csl:45}WIPO, 2009. {Patent-based Technology Analysis Report - Alternative Energy Technology}.

\phantomsection
\label{csl:48}Weiss, M., Junginger, H.M., Patel, M.K., 2008. {Learning energy efficiency: experience curves for household appliances and space heating, cooling, and lighting technologies}.

\phantomsection
\label{csl:50}Yuan, F., Miyazaki, K., 2014. {Understanding the dynamic nature of technological change using trajectory identification based on patent citation network in the Electric Vehicles industry}, in: Management of Engineering \& Technology (PICMET), 2014 Portland International Conference On. IEEE, pp. 2780–2790.

\phantomsection
\label{csl:65}Zakon, R., 1997. {Hobbes{\textquotesingle} Internet Timeline}. {RFC} Editor. \url{https://doi.org/10.17487/rfc2235}

\phantomsection
\label{csl:66}von Stackelberg, P., 2011. {Technological Eras}.

\end{document}