3.2 Similarity in gene expression among samples
Similarity in gene expression among biological replicates - i.e.,
individuals belonging to the same treatment group - gives an idea of
reproducibility of our data and of the overall variation among samples.
Similarity in gene expression within and among groups can be estimated
using the sample correlation or Euclidean distances (see Materials and
Methods for further details). Pearson correlation coefficients (r) for
biological replicates were above 0.9 for the majority of comparisons
(same tissue and same group), with only a few pairwise comparisons
having values 0.8<r<0.9 (Supporting Information
Table S2). Lower values near 0.8 were mostly due to one sample (blood,
group 2) being different from the rest. This indicates that although
variation in gene expression occurs among individuals, biological
replicates are generally very similar.
Pearson r values among groups for each tissue type are slightly lower
than what obtained for individuals belonging to the same group, but
generally above 0.85 and with the majority of pairwise comparisons being
above 0.90 (Supporting Information Table S2), indicating comparable
levels of gene expression across tested groups for the same tissue.
Also, in this case, the same sample mentioned above (blood, group 2) has
lower r (>0.73) (Supporting Information Table S2). Pearson
r values between the two sequencing platforms for whole mRNA-Seq samples
(called NEB here below) are all above 0.87 except for the comparisons
involving the blood sample from group 2 (>0.77) (Supporting
Information Table S2), indicating that different sequencing methods did
not influence the number of uniquely mapped reads. Finally, r among
different tissues (for 3’ Tag-Seq) and among 3’ Tag-Seq vs. NEB are
generally <0.5 and sometimes negative, suggesting different
levels of gene expression among tissues and among the same mapped genes
between the two library methods.
Heatmaps of the distance matrices for the different group comparisons
provide hierarchical clustering based on sample distances. When heatmaps
were made combining data from the three different tissues for 3’
Tag-Seq, we found three clusters corresponding to the three different
tissues (Figure 1a). However, within each cluster, as also shown by the
heatmaps built with data from each tissue separately, samples belonging
to different groups are clustered together, indicating no clear
difference in gene expression among the tested groups (Supporting
Information Figure S2). Lack of difference in gene expression among the
different groups was also found using NEB data (Figure 1). Finally,
comparison of 3’ Tag-Seq vs NEB found differences in gene expression
between the two methods; this difference was however not associated with
any of the groups (Figure 1). Principal component analysis (PCA),
another way to visualize variation in gene expression among samples,
further supports the lack of differences among sampling methods and time
of tissue harvesting and the differentiation between 3’ Tag-Seq versus
NEB and among the three sampled tissues (Figure 2 and Supporting
Information Figure S3).