Materials and Methods
Construction of human Transcription factor and miRNA
regulatory
networks
The
human Transcription factor (TF) and miRNA regulatory networks were built
by integrating miRTarBase, TarBase, TRANSFAC and TransmiR14-16. These four databases include curated
interactions among human TFs, miRNAs, and target genes as well as
standardization of gene and miRNA names within the regulatory networks
using data from NCBI and miRbase databases. Additionally, all regulatory
relationships within the regulatory network were literature-supported.
In total, there were 460 TFs, 2,434 miRNAs, 13,898 target genes and
98,894 edges in the regulatory network.
Known HCC and HCV-associated genes and miRNAs
DisGeNET,
a discovery platform containing one of the largest publicly available
collections of genes and variants associated with human diseases, was
utilized to identify two disease-associated genes 17.
Two disease-associated miRNAs were collected from the miR2Disease18 and HMDD 19 , which are curated
databases containing experimental evidence for human microRNA (miRNA)
and disease associations. We also utilized genes in the KEGG pathways
associated with HCC (168) or HCV (155). We included 30 known
HCC-associated genes in DisGeNET and 463 known HCC-associated miRNAs
from either miR2Disease or HMDD. Finally, 18 known HCV-associated genes
in DisGeNET and 100 known HCV-associated miRNAs in either miR2Disease or
HMDD were used for network analysis.
Disease-related network construction
For the disease-related network construction, the closer the nodes in
the network to the known disease genes, the more likely they are
disease-associated 20. In order to construct a more
closely related subnet, we selected nodes directly connected to the
known disease-associated genes in the background network to build an HCC
and HCV-related network. In total, there were 409 TFs, 2,300 miRNAs,
10,697 target gene and 48423 edges in this regulatory network.
Differentially expressed genes in the three datasets
The normalized mRNA expression profiles of HCC (TCGA), HCV
(GSE15387)
and HCV-related HCC
(GSE44074) were downloaded from the
Gene Expression Omnibus (GEO) database 21 and The
Cancer Genome Atlas (TCGA) database 22. There were 374
HCC samples and 50 normal samples in the TCGA data set, 35 HCV-related
HCC samples and 37 HCC samples in the GSE44074, as well as 60 HCV
samples and 60 normal samples in the GSE15387. For mRNA expression data,
probe sets were mapped to Entrez Gene IDs. When multiple probes
corresponded to the same gene, the mean expression value of these probes
was used to represent the gene expression level. We obtained 2, 3, and 4
differentially expressed genes at the p -values of less than 0.05
by using edgeR (TCGA data) and SAM (GEO data) in each of the three data
sets.
Identification of the subnetworks for each dataset
To construct subnetworks for each dataset, we extracted differentially
expressed genes and their neighbor genes from the disease-related
network. The regulatory relationships between these genes and miRNAs
constituted a core regulatory subnetwork at multiple stages of disease
development. We identified 3 subnetworks, which we termed the HCC
subnetwork, HCC-HCV subnetwork and HCV subnetwork.
Extraction of candidate risk regulatory pathways
Using the BFS algorithm to extract risk regulatory pathways from the
three subnetworks, we identified all the pathways in the network from
the nodes indegree 0 to outdegree 0, and pathways with a length greater
than 2 were regarded as the candidate risk pathways.
Prediction of key regulators
Gene expression varies in different tissues and during different
diseases. Some genes are expressed at a specific stage of a given
disease, while some genes continue to play a role throughout the
process. We analyzed all the pathways in the three subnetworks to
identify the most critical pathways in each network by examining highly
shared genes. We propose a KP score to evaluate key pathways, which is
calculated as follows:
Where denotes the number of nodes on a pathway within a subnetwork,
denotes the number of intersection nodes between pathway and subnetwork
, denotes the length of the longest pathway within the subnetwork that
satisfies the conditions, denotes location weight score of the
intersection gene within the pathway, denotes whether the gene at this
position is an intersection gene, if yes, then the value of is 1, if
not, the value of is 0, upstream genes get higher scores.
Survival analysis
In this study, we constructed three subnetworks for HCC, HCV samples and
normal samples, and identified key pathways from the subnetworks. We
next investigated whether the key regulators could distinguish HCC
patients with good or poor outcomes. From these data, we obtained TCGA
HCC dataset mRNA expression, miRNA expression and clinical information.
Next, we used the K-means method (K=2) to cluster all patients into two
groups based on the mRNA and miRNA expression. Finally, Kaplan–Meier
curve and log-rank tests were used to evaluate the difference in overall
survival time between the two groups of patients.