For statistical significance it is necessary that the observed test statistic is found in the tail of the distribution generated, implying that the classification responses predicted would only occur very rarely (i.e. not by chance) if the data order was rearranged. Having generated classification models based on the most robust predictors from the earlier cross-validation exercise, all four models imply that some relationship has been identified between the substitution mode predictions expected and the two patent indicator dimensions used that is specific to the data provided, although as seen in Tables \ref{table:results_high_dimensional_model} and \ref{table:results_benchmarking} the fit achieved varies depending on the model used. In this last stage of the analysis the permutation testing now reveals that the high and low-dimensional models are likely to perform best out-of-sample as the observed F-statistics are furthest along each distribution's right tail in relative terms in comparison to the other distributions generated for the constant and monomial based models. This shows these two models have the lowest probability of occurring by chance, and are most likely to be generalisable to future datasets. A similar level of statistical significance is observed between the high and low-dimensional models, although as this permutation testing was only based on 1,000 permutations, the distributions could still evolve further with a greater number of permutations. However, the constant basis system model is more clearly seen here not to perform as well out-of-sample, with the observed F-statistic closest to the main body of the distribution. This, in combination with the other 'goodness-of-fit' measures shown In Tables \ref{table:results_high_dimensional_model} and \ref{table:results_benchmarking}, would therefore suggest that the high-dimensional functional linear regression model provides the best basis for a technology substitution classification model from those tested in this analysis.
Method limitations
Although precautions have been taken where available to ensure that the methods selected for this study address the problem posed of building a generalised technology classification model based on bibliometric data in as rigorous a fashion as possible, there are some known limitations to the methods used in this work that must be recognised. Many of the current limitations stem from the fact that in this analysis technologies have been selected based on where evidence is obtainable to indicate the mode of adoption followed. As such the technologies considered here do not come from a truly representative cross-section of all industries, so it is possible that models generated will provide a better representation of those industries considered rather than a more generalisable result. This evidence-based approach also means that it is still a time-consuming process to locate the necessary literature material to be able to support classifying technology examples as arising based on one mode of substitution or another, and to then compile the relevant cleaned patent datasets for analysis. As a result only a relatively limited number of technologies have been considered in this study, which should be expanded on to increase confidence in the findings produced from this work. This also raises the risk that clustering techniques may struggle to produce consistent results based on the small number of technologies considered. Furthermore, any statistical or quantitative methods used for modelling are unlikely to provide real depth of knowledge beyond the detection of correlations behind patent trends when used in isolation. Ultimately some degree of causal exploration, whether through case study descriptions, system dynamics modelling, or expert elicitation will be required to shed more light on the underlying influences shaping technology substitution behaviours.
Other data-specific issues that could arise relate to the use of patent searches in this analysis and the need to resample data based on variable length time series. The former relates to the fact that patent search results and records can vary to a large extent based on the database and exact search terms used, however overall trends once normalised should remain consistent with other studies of this nature. The latter meanwhile refers to the fact that functional linear regression requires all technology case studies to be based on the same number of time samples. As such, as discussed in Appendix A, linear interpolation is used as required to ensure consistency on the number of observations whilst possibly introducing some small errors which are not felt to be significant.
Conclusions from statistical ranking and functional data analysis
Expanding on previous historical accounts of technological substitutions this study has examined the premise that two principal modes are often observed when considering transitions between successive commercially prevalent technologies: reactive and presumptive technological substitutions. These two modes are believed to correspond to significantly different technology adoption characteristics (not discussed in this paper), with scientific foresight believed to play a crucial role in the identification of presumptive innovations, and performance stagnation leading to reactive transitions. In both cases, technological anomalies are believed to arise, either as a result of scientific or technological crisis, that subsequently trigger the eventual shift to the next technological paradigm. As such, this paper has considered 23 example technologies where literature evidence of performance development trends has been found in order to test the ability to correctly identify observed adoption modes using bibliometric, pattern recognition, and statistical analysis techniques. The results obtained from this analysis suggest that statistical analysis of patent indicator time series, segmented based on identified Technology Life Cycle features, provides a possible means for classification of technological substitutions. Specifically, for the datasets considered measures of the number of cited references and the involvement of non-corporate entities by year during the emergence phase were found to provide a good indication of the expected mode of substitution when used as a basis for functional linear regression (correctly classifying 19 out of 20 technologies included in this stage), and performed consistently well in statistical ranking of predictive capability. These selected patent data dimensions can be associated with perceptions of scientific and technological production respectively, consistent with the basic prerequisites listed in section \ref{585124} for a classification scheme that can identify presumptive technological substitutions.
Whilst these two patent dimensions occur in all of the most robust predictor subsets (i.e. in terms of out-of-sample reliability) when basing analysis on the emergence stage, this does not prove that these are the only indicators capable of predicting modes of technological substitution. As discussed in section \ref{311620}, the possibility of orthogonality has not been ruled out with regards to the other patent indicators shown in Table \ref{table:bibliometric_indicators}. However, these two dimensions are in good agreement with the technological anomaly arguments put forward by Constant in sections \ref{275337} and \ref{585124}, and so were felt to be reasonable for forming the basis of the technology classification model that has been developed using functional linear regression. In particular, a regression fit made up of beta coefficient functions with many B-spline elements was found to provide a viable means of correctly matching the mode of substitution to the technology profile being evaluated when considering multiple 'goodness of fit' measures.
Permutation testing of the derived technology classification model further suggests that the regression fit is sensitive to the ordering of the expected mode labels relative to the technology time series being considered, so this relationship would appear to be based on the specifics of the individual technology curves considered, and does not appear to be occurring by chance. This implies that it may be possible to predict modes of substitution from limited bibliometric data during the earliest stages of technology development, providing some evaluation of the progress through the early stages of Technology Life Cycle is made (this can be obtained using a nearest neighbour matching process, not discussed in this paper). Equally this shows that the functional data approach employed corroborates well the earlier statistical rankings produced using Dynamic Time Warping, K-Medoids clustering, and leave-one-out cross-validation of the selected patent indicators, suggesting that these two methods are compatible for this type of analysis.
It is also important to remember the potential limitations of this study that would need to be addressed for further confidence in the methodology used. Firstly, only a relatively small number of technologies have been evaluated in this study due to the time-consuming process required for data extraction, preparation, and identification of supporting evidence from literature for the assignment of expected classification labels. Consequently, whilst precautions have been taken to minimise the risk of model over-fitting, the cross-validation procedures employed would benefit from further verification with a more diverse spread of technologies to ensure that out-of-sample errors are accurately captured here. Regression models based on small sample sizes can be very fickle to the datasets they are calibrated to, so it cannot be ruled out that the results presented here are a better fit to the industries included in this analysis, rather than a model that can be necessarily generalised to all technologies.
However, perhaps the most important note of caution regarding this work relates to the quantitative approaches used here. Whilst statistical approaches are well-suited to detecting underlying correlations in historical and experimental datasets, this on its own does not provide a detailed understanding of the causation behind associated events, particularly in this case when considering the breadth of reasons for technological stagnations, 'failures', or presumptive leaps to occur. Equally, statistical methods are not generally well suited to predicting disruptive events and complex interactions, with other simulation techniques such as System Dynamics and Agent Based Modelling performing better in these areas. Accordingly, to identify causation effects and test the sensitivity of technological substitution patterns to variability arising from real-world socio-technical behaviours not captured in simple bibliometric indicators (such as the influence of competition, organisational, and economic effects), the fitted regression model presented here also needs to be evaluated in a causal environment.
Similarly, in order to demonstrate practical applicability the mode of substitutions considered here need to be related to observed adoption characteristics (not discussed in this paper). Consequently, a System Dynamics model built on the regression functions identified in this study is proposed (although not discussed here) in order to calibrate these extracted technology profiles and mode predictions to empirical adoption data. This aims to more thoroughly explore the causal mechanisms relating early indicators of technological substitution to the eventual adoption patterns observed and provide a means of applying greater reasoning to the relationships identified here.