Identification of smoothing parameter values for regression coefficients

With the functional data objects for each model component now ready, a cell array containing each model component along with a constant predictor term is generated for use in the functional liner regression. Before the final regression analysis can be run, a smoothing parameter for the regression coefficient beta basis system has to be selected. This is achieved by calculating leave-one-out cross-validation scores (i.e. error sum of squares values) for functional responses using a range of different smoothing parameter values, as per section 9.4.3 and 10.6.2 of \cite{Ramsay_2009}. The functional parameter object used in the beta basis system is then redefined based on the refined smoothing parameter identified in order to ensure that the functional linear regression analysis converges on a model that has the best chances of performing well out-of-sample.

Results and Discussion

The functional linear regression analysis is now run with the identified smoothing parameters and scalar response variables to identify the \(\beta_i\) coefficients and the corresponding variance, used to define the 95% confidence bounds (see sections 9.4.3 and 9.4.4 of \cite{Ramsay_2009} respectively). Fig. \ref{820059} to Fig. \ref{942889} show the resulting \(\beta_i\) coefficients and confidence bounds for the number of non-corporates and the number of cited references by priority year, when considering the emergence phase of development and using a high-dimensional regression fit (i.e. when the beta basis system for each regression coefficient is made of a large number of B-splines). This regression fit successfully identifies the correct mode of substitution from patent data available in the emergence stage for 19 of the 20 technologies considered.
From the confidence bounds on these plots it can be seen that for both the number of non-corporates and the number of cited references by priority year the variance is highest at the start of the emergence phase: this is often when the least amount of data is available for comparing each technology, so this is not entirely surprising as this represents the point of greatest uncertainty. However, Fig. \ref{822351} and Fig. \ref{942889} also illustrate how the influence these two patent dimensions have on the predicted mode of substitution varies with time during the emergence phase. More specifically, deviations away from zero in these coefficient functions equate to an increased positive or negative weighting for the associated patent indicator count at that moment in time, within the determination of the predicted mode of substitution. As such it can be seen that any patent indicator counts at t = 0 for the number of non-corporates by priority year (assuming these are present) will have a more significant influence on the final mode of substitution predicted. Equally, these particular regression results would suggest that the impact of non-corporates activity next peaks around 40% of the way through the emergence phase (potentially corresponding to the hype effect suggested previously), and again at the end of the emergence phase. For the number of cited references by priority year, this regression model suggests that the times of greatest impact on the mode of substitution are at the very beginning and at the very end of the emergence stage respectively. Whilst these coefficient plots gives some indication of the relative weighting applied to patent indicator counts as time progresses, the cumulative nature of the inner products used in Eq. \ref{eq:linear_regression} makes it difficult to visually infer from these plots alone which mode the technology under evaluation is currently converging towards. For this it is also necessary to include the corresponding patent indicator count values that these coefficient terms are multiplied by for the specific technology being assessed.