1 Introduction
RNA sequencing (RNA-Seq) is increasingly common in ecological and
evolutionary studies focusing on variation in gene expression (Alvarez
et al. 2014, Conesa et al. 2016, Ekblom & Galindo 2011). For example,
RNA-Seq is commonly used in studies on physiology, conservation,
epigenetics, and to assess organismal response to environmental
variables (Todd et al. 2016, Corlett 2017, Rey et al. 2020). RNA-Seq is
highly accurate for quantifying expression levels, requires less RNA
sample when compared to microarrays, does not necessarily require a
reference genome (e.g., Cahais et al., 2012), can uncover sequence
variation in transcribed regions, and shows high reproducibility (Wang
et al. 2009). However, gene expression data can be strongly influenced
by biological and non-biological factors such as experimental and
stochastic variation (Auer & Doerge 2010, Qian et al. 2014, Todd et al.
2016). Given the recent surge in RNA-based studies, it is critical to
identify and quantify sources of variation in gene expression.
Sampling methods can be an important experimental cause of variation in
estimated gene expression (Mutch et al. 2008, Passow et al. 2019). Delay
in sample preservation after collection may result in higher RNA
degradation and introduce bias in estimated gene expression (e.g.,
Gayral et al. 2011, Romero et al. 2014). This is a consequence of mRNAs
being produced in relatively short bursts in response to internal or
external stimuli and having short half-lives (Ross 1995; Staton et al.,
2000). Similarly, the use of different anesthetics, methods of tissue
preservation, different RNA extraction methods, and timeframe between
sample collection and RNA isolation can all impact RNA quality and gene
expression (e.g., Debey et al. 2004, Huitink et al 2010, Jeffries et al.
2014, Mutter et al. 2004, Olsvik et al. 2007, Passow et al. 2019).
Stochastic variation in gene expression due to variation in cellular and
molecular processes can result in random differences among individuals
of the same population for the same genes without necessarily this being
a consequence of biological (e.g., maternal effects and potentially
heritable variation) or micro-environmental variation. For studies with
a low count of biological replicates, this variation may be
misinterpreted as biologically relevant (Hansen et al. 2011, Kaern et
al. 2005). Detection of stochastic variation in gene expression may be
achieved through careful sampling design (e.g., individuals vary at only
one treatment) and by increasing the number of sampled individuals (Kim
et al. 2015, Liu et al. 2014) to gain statistical power (Ching et al.
2014). However, often RNA-Seq experiments are limited in the number of
sampled individuals due to cost, with consequent loss of statistical
power and potentially misleading results (Bi & Liu 2016, Li et al.
2013).
Independent of sample size, library construction and RNA sequencing
techniques may also produce variability in estimated gene expression.
Whole mRNA sequencing methods often result in fragment length bias
because longer transcripts are sheared into more fragments so that a
higher number of reads will be assigned to them than shorter
transcripts, causing an overrepresentation of larger transcripts in
sequencing libraries (Ma et al. 2019, Oshlack & Wakefield 2009, Roberts
et al. 2011). Cost limitations as well as fragment size bias of whole
mRNA sequencing has led to the development of RNA sequencing library
construction protocols that allow processing a larger number of samples
in a more cost-effective manner (Meyer et al. 2011, Morrissy et al.
2009, Wu et al. 2010). The 3’ RNA Tag-Seq method (also known as
Quant-Seq 3’ mRNA-Seq), for example, only primes the 3’ poly-A tail,
reducing the sequencing effort and cost, and generates an essentially
uniform distribution of fragments with respect to original RNA length
(Lohman et al. 2016, Ma et al. 2019).
In fish, RNA-Seq data are commonly used to investigate the effects of
environmental variables (e.g. temperature, hypoxia) on gene expression
(e.g., Krishnan et al. 2020, Long et al. 2015, Meyer et al. 2011, Smith
et al. 2013, Wang et al. 2015). However, little is known about the
influence of different sampling techniques on gene expression in fish,
especially under field conditions. For example, field conditions may
limit the use of optimal sampling protocols or storage methods to reduce
variation (e.g., using liquid nitrogen in remote sampling locations or
fast processing times for tissue isolation) (Mutter et al. 2004,
Pérez‐Portela & Riesgo 2013). Furthermore, field capture may also
result in increased variation among individuals, including among
biological replicates (Pearce et al. 2016). For example, stress related
genes may be overexpressed as a result of long handling time before
sampling. The impacts of handling stress on fish physiology are well
understood (Sopinka et al. 2016). Although most studies focus on
glucocorticoid and blood chemistry responses to capture (Milla et al.
2010, Wiseman et al. 2007, Wood et al 1983, Milligan 1996, Barton 2002,
Ruane et al. 2001; see also Romero & Reed, 2005 for influence on
handling time on non-fish species), gene expression responses to
handling stress indicate that the magnitude, intensity, and duration of
changes vary across genes, species, and tissue types (Krasnov et al.
2005, Lopez et al. 2014). While there is some evidence that a sample
specimen’s blood cortisol and glucose levels are affected by capture
method (e.g., electrofishing), to our knowledge (Barton & Dwyer 1997,
Barton & Grosh 1996, Bracewell et al. 2004), it is unknown whether gene
expression is affected by capture method or handling time prior to
sample collection.
Here, we test whether sampling method (electrofishing vs dip netting),
processing time, and RNA-Seq libraries (3’ Tag-Seq method vs. whole
mRNA-Seq) influence gene expression data in multiple tissue types from
westslope cutthroat trout (Oncorhynchus clarkii lewisi ), a
species of conservation concern native to western North America (Behnke
2002; Allendorf and Leary 1988; Shepard et al. 2003). The results of
this study will address the sources of gene expression variation under
field conditions and provide a foundation for improving future RNA-based
study designs for field sampling of wild caught non-model fish and other
species.