To begin with this procedure includes removing any non-breaking spaces values in tabulated cells (which would interfere with later counts of citations), removing any leading and trailing white spaces from column headings, and translating column headings into a format recognisable by MATLAB (i.e. replacing any spaces, slashes, brackets, full stops or hyphens with underscores). Individual column headings are then used to define variables in a MATLAB table, whilst a generic 'counter' variable is appended to the end of the table to enable the subsequent construction of pivot tables. Having transformed the data into a recognised table format, the script (provided in Appendix B) then identifies entries that already have valid date entries provided against them. Unique corporation, non-corporation, and inventor IDs are then appended to the tabulated records based on determining the similarity that exists between entries observed in these three fields. Next, the script cycles through each record individually and extracts the application and priority dates, where present, for each patent family (this involves scanning through all priority dates in the case where multiple priority dates are listed against a single record and identifying the earliest date). At this point, the script also counts the number of references and the number of patents cited against each individual record, whilst also mapping all references to any included IPC categories to the correct IPC count tally (based on \cite{Inventory_of_ever_used_IPC_symbols}). This enables the number of distinct IPC subclasses recorded for a given patent family to be counted, and is used later when recombined with the corresponding tallies from every other patent family record to rank the top 5 and top 10 most heavily associated IPC subclasses with a developing technology for each year. For those records where valid dates were not located in the previous steps (typically in less than 5% of records), the script then checks to see if any other date types are present against each record from the 'Basic Year', 'Application Year', or 'Priority Year' fields. 'Priority Year' should always be the earliest of these dates, as this represents the original conception of the idea, rather than the date at which the application was filed with the relevant patent office. Equally, all dates are checked to ensure that none are earlier than 1790 (when the earliest known US patent was recorded, representing the world's earliest patent registration system), as any dates recorded before this year are very likely to be errors. Once any missing dates have been imputed where possible, the script then determines the time period bounded by the set of records in the current batch, and updates the global time frame for the current technology as required. The bibliometric indicator counts specified in Table 2 can then be compiled for each year considered in the current batch of records, with the current batch being marked as completed before repeating the steps above for the next batch of records. In this way a collection of summary indicator count tables are built representing each batch of records. These tables are then combined into one overall summary table for the technology being considered, taking care to expand each batch of results for years with 'zero' records as required so that the same set of years is present when adding corresponding table rows together. To verify that the MATLAB data extraction and cleaning processes were functioning as planned, the output counts of the MATLAB scripts were compared for several sample batches to an equivalent process implemented using Excel pivot tables. This comparison showed that in some instances where formatting issues were present the MATLAB scripts were more successful than Excel in filtering out blank values, but that in both cases the overall count values generated corresponded closely to those expected.