2. Materials and Methods

2.1 Strains and experimental overview

An isogenic culture of the recombinant S. cerevisiae strain, C.U17 was cultured in prolonged chemostat cultivations (Seresht et al., 2013). The strain contains a 2µm vector with an expression cassette encoding the Triose Phosphate Isomerase 1 (TPI1) promoter and a gene encoding a single-chain insulin precursor. Both HIS3 and URA3 were used as auxotrophic selection markers. The C.U17 strain is referred to as theinitial cell clone throughout the rest of the manuscript.After 271 hours of glucose-limited growth, end sample cells from one of the cultivations were collected and stored as glycerol stocks at -80°C. These cells are referred to as end sample cells . One of the glycerol stocks were thawed, washed three times in PBS buffer and used for FACS sorting (Fluorescence Activated Cell Sorting) of three individual populations separated based on particle size measured as forward scatter light area (FSC-A) (Figure 1A). 10,000 sorted cells from each population were propagated in individual shake flasks with minimal medium at 30°C, harvested at an OD600 of 20 and stored as glycerol stocks at -80°C (Figure 1B). The glycerol stocks were used to reinitiate new chemostat cultures with each of the FACS sorted populations (Figure 1C).

2.2 Chemostat cultivations

Aerobic chemostat cultivations were performed in 0.5 L fully instrumented and automatically controlled BIOSTAT® reactors (Sartorius Stedim Biotech S.A, Germany). The strains were cultured in duplicates as previously described in Wright et al., (2020) at a temperature of 28oC, pH of 5.9, aeration rate of 2 vvm and dilution rate of 0.1 h-1. A minimal medium with a glucose concentration of 75 g/L was used. Each cultivation was initiated with 8 hours batch phase and 52 hours fed-batch phase. Cell dry weight and extracellular heterologous insulin production were measured each day as previously described in Wright et al., (2020).

2.3 Fluorescent activated cell sorting (FACS)

Samples for flow cytometry analysis were collected every day and stored in glycerol at -80 °C. Prior to analysis, the samples were thawed and washed three times in PBS buffer. Flow cytometry analysis and cell sorting with respect to particle size (FSC-A) were performed using a Sony Cell Sorter SH800S. 100.000 cells were analyzed in each sample using a 100 µm microfluidic sorting chip. The samples were diluted to obtain an event rate below 1000 eps. The raw flow cytometry data (fcs files) were analyzed in the software environment R version 3.6.1 using the flowCore package (Ellis et al., 2020). Stacked density plots of the log2(FSC-A) distribution at different time points were constructed by application of the ggplot2 package in R (Wickham, 2016). Histogram bar charts of log2(FSC-A) were constructed by sorting the cell count data into 833 uniformly sized bins using the build-in hist function in R.

2.4 Analysis of intracellular proteins

A minimum of four samples were withdrawn for analysis of intracellular proteins at different time points of the reinitiated chemostat cultivations of the three FACS sorted populations (See Supplementary materials Table S1 for an overview of the different samples). The samples were stored at -80 °C before further processing. Intracellular proteins were quantified by label‐free quantification as previously described in Wright et al., (2020). For analysis of the samples, liquid chromatography was performed on a CapLC system (Thermo Fisher Scientific) coupled to an Exploris 480 mass spectrometer (Thermo Fisher Scientific). The peptides were separated with a flow rate of 1.2 µl/min on a 75‐µm × 15 cm 2 µm C18 easy spray column. A stepped gradient, going from 4% to 40% acetonitrile in water over 50 minutes was applied. Mass spectrometry (MS)‐level scans were performed with the following settings: Orbitrap resolution: 60,000; AGC Target: 1.0e6; maximum injection time: 50 ms; intensity threshold: 5.0e3; and dynamic exclusion: 25 s. Data‐dependent MS2 selection was performed in Top 12 mode with HCD collision energy set to 30 % (AGC target: 1.0e4; maximum injection time: 22 ms).

2.5 Data processing of proteome data

For analysis of the thermos rawfiles, Proteome Discover 2.3 (Thermo Fisher Scientific) was applied. The following settings were used for the analysis: Fixed modifications: Carbamidomethyl (C) and Variable modifications: oxidation of methionine residues. First search mass tolerance of 20 ppm and an MS/MS tolerance of 20 ppm. Trypsin was selected as an enzyme and allowing one missed cleavage. False discovery rate was set at 0.1%. The data was searched against the S. cerevisiae  database retrieved from Uniprot with proteome ID AUP000002311 and the sequence of the heterologous insulin. The data sets can be found at data.dtu.dk (https: //doi.org/10.11583/DTU.13536179).
Batch variations between different proteome datasets were reduced by scaling each protein such that the mean log2(abundance) was the same between data sets. A differential expression analysis was performed between Population 1, Population 2 and Population 3 for samples taken in the beginning of the cultures (≤ 48 hours of chemostat growth) and again in the end of the cultures (after 254 hours of chemostat growth). Only proteins which were measured in all samples between the compared populations were included in the analysis meaning that 2635 proteins were compared between Population 1 andPopulation 3 , 2811 proteins were compared betweenPopulation 1 and Population 2 and 2679 proteins were compared between Population 2 and Population 3 . The analysis was performed using the EdgeR package (Robinson, McCarthy, & Smyth, 2010) in R version 3.6.1. The proteome from the beginning of chemostat cultures with the three populations were furthermore compared to a previously published proteome of the initial cell clone(Wright et al., 2020). For an overview of the samples used for the comparison, see Supplementary materials Table S2. 2716 proteins betweenPopulation 1 and the initial cell clone were compared, 2770 proteins between Population 2 and the initial cell clone were compared and 2692 proteins were included in the comparison of Population 3 and the initial cell clone .
For each comparison between two strains , proteins were grouped in clusters depending on whether the level of the proteins were higher (log2 fold-change > 0.5, q-value < 0.05) or lower (log2 fold-change < 0.5, q-value < 0.05) in strain A compared to strain B. Gene ontology (GO) process terms were obtained from geneontology.org/annotations/sgd.gaf.gz on 16 November, 2020. A one‐sided Fisher’s exact test was used to investigate whether the protein clusters, were enriched with proteins annotated with certain GO process terms (q-value <0.05). The test was performed using the R package bc3net  package (de Matos Simoes, Tripathi, & Emmert-Streib, 2012).

2.6 Microscopy

The morphology of FACS sorted populations were visually inspected using a LMI-005-Leica Microscope and a Confocal Microscope-SP8.

2.7 Determination of maximum growth rate in batch cultures

Maximum growth rates were determined based on OD600measurements from exponentially growing cells in 100 ml shake flasks with minimal medium and 3 % v/v glucose (Seresht et al., 2013). Three biological replicates were performed for each strain.