We developed a novel software package (DRviaSPCN) that enables repurposing drugs via a subpathway (SP) crosstalk network. The main process include construction of the SP network and calculation of the centrality scores of SPs to reflect the influence of SP crosstalk, calculating the enrichment score of subpathways and weighting them with corresponding centrality score to get weighted enrichment score(weighted-ES), calculation of enrichment scores of drug- and disease-induced dysfunctional SPs and weighted them by the centrality scores of SPs, evaluation of the drug-disease reverse association at the weighted SP level, identification of cancer candidate drugs. There are also several functions used to visualize the results such as visualization of the subpathway network structure of interest, chemical molecular formula of the drug or compound, and heatmap of the expression of subpathways in different sample types.
This vignette illustrates how to easily use the DRviaSPCN package. Here, with the use of functions in this package, users could identify potential therapeutic drugs for disease through estimating drug-disease reverse association.
The method consists of three parts:
1.Evaluating influence of SP crosstalk. We think two SPs are functionally related if there are some genes in two SPs that share at least more than one common biological function (GO term). To analyze the influence of SP crosstalk, we first construct a weighted SP/GO bipartite network. We defined an edge between a SP and a Go term if they have at least one common gene. For a certain disease, we used the different level of the shared genes between two types of samples (disease and normal), and the Jaccard index between a pair of SP and go term to define the weight of the edge. Next, we constructed a SP-SP network based on the SP-GO network. Similarly, we defined an edge between two subpathways if they have at least a common biology function. After the above steps, a SP-SP network induced by disease could be constructed. Then, we applied random walk algorithm to the SP-SP network to calculate eigenvector centrality score of SPs which can reflect the influence of SP crosstalk. Finally, the statistical significance (p-value) of these centrality scores was assessed using a bootstrap-based randomization method. In the same way, the eigenvector centrality score and p-value of SPs induced by each drug can also be calculated.
2.Evaluating drug-disease reverse association based on disease- and drug-induced SPs weighted by the SP crosstalk. In this part, CMap build 02 raw data was downloaded from the CMap website (Lamb et al. 2006). After constructing gene expression profiles, the Gene Set Enrichment Analysis (GSEA) was used to calculate the enrichment score of SPs For a disease, we applied the GSEA method to the gene expression profile of the disease. After calculating the enrichment score (ES) of SPs, we weighted them with corresponding centrality scores to calculate weighted enrichment score (weighted-ES). In the same way, the weighted enrichment score (weighted-ES) of SPs induced by each drug can also be calculated. We defined a drug-disease reverse association score (RS) to reflect the treatment extent of a drug at the SP level. For each drug and certain disease, the SPs were ranked in descending order based on the corresponding weighed-ES. We mapped the up-and down-regulated subpathways induced by disease to the ranked list of SPs induced by each drug to calculate the ks.up and ks.down. And the RS is equal to ks.up - ks.down.
3.Identifying statistically significant disease-candidate drugs. Generally, a drug was applied with different instances (different concentrations, cancer cell lines or duration), we thus calculated the RS for each instance of all drugs. After normalization, a ranked instance list was constructed according to the normalized scores of all instances. For a given drug, the instance set of the drug was extracted and mapped to the ranked list, and the Kolmogorov-Smirnov statistic was used to compute the drug enrichment score (DES) based on the instance set. If the instance set of the drug enriches at the negative RS region of the list, the DES will be strong negative indicating the drug in different instances may have consistent treatment effect on the disease. To estimate the statistical significance (empirical p-values) of the DES, we randomly selected the same number of instances for the drug to recalculate the DES, and we repeated this process N times. The p-value was then calculated as p-value = M/N, where M is the number of randomly selected DESs less than the observed DES. In the package, the default randomly selected times are set at 1000. To correct for multiple comparisons, we adjusted the empirical p-values of drugs using the false discovery rate (FDR) method
This package provides the CalCentralityScore
function to calculate the centrality score of subpathways and the
corresponding p-value.
This package provides the Optimaldrugs
function to
calculate the DES of drugs and corresponding p-value.
This package provides the plotSPW
function to plot
SP network structure.
This package provides the getMolecularFM
function to
plot the chemical molecular formula of the drug or compound.
This package provides the Disease2SPheatmap
function
to plot heatmap of the activities of subpathways in different sample
types that are regulated by disease.
This package provides the Drug2SPheatmap
function to
plot heatmap of the activities of subpathways in different sample types
that are regulated by drugs.
This package provides the GetExample
function to
return example data and environment variables, such as gene expression
profile, sample label and so on.
In addition, the essential data DrugSPESCMatrix
and
DrugSPPvalueMatrix
which are subpathways weighted-ES
induced by all drugs and statistic significance (p-value) of centrality
score of subpathways regulated by all drugs were stored in our
DRviaSPCNData package. Users could download and use this package by the
following code:
### Download DRviaSPCNData package from GitHub
library(devtools)
install_github("hanjunwei-lab/DRviaSPCNData",force = TRUE)
library(DRviaSPCNData)
### Get weighted-ES of subpathways
DrugSPESCMatrix<-GetData("DrugSPESCMatrix")
### Get p-value of subpathways centrality score
DrugSPPvalueMatrix<-GetData("DrugSPPvalueMatrix")
The function “CalCentralityScore” was used to calculate the centrality scores of SPs to reflect the crosstalk influence, which were used as weights in the calculation of drug-disease reverse association score.
The commands are as follows:
## Warning: package 'igraph' was built under R version 4.3.2
###Obtain input data
GEP<-GetExample('GEP')# Get the gene expression profile
Slabel<-GetExample('Slabel')# Get the sample class label
###Run the function
CentralityScoreResult<-CalCentralityScore(ExpData=GEP,Label=Slabel,nperm = 1000)
## SubPathID Size Centralscore Pvalue FDR
## 1 04920_4 36 0.016816433 0.000 0.000
## 2 05167_28 9 0.015350713 0.000 0.000
## 3 04657_3 10 0.014931183 0.003 0.611
## 4 05205_51 7 0.014164259 0.006 0.611
## 5 05200_31 22 0.010549439 0.007 0.611
## 6 05205_52 14 0.011014076 0.009 0.611
## 7 04625_6 9 0.013364221 0.011 0.611
## 8 05134_7 14 0.009284269 0.011 0.611
## 9 05167_6 30 0.009896507 0.011 0.611
## 10 01521_4 47 0.009160094 0.013 0.611
The function Optimaldrugs
is used
to calculate the DES and statistic significance of drugs. The
detailed algorithm can be seen in the introduction part. Users could
screen out the optimal therapeutic drugs according to a specific
threshold. Here we provide weighted and unweighted methods to calculate
the score, which can be selected by parameters weight = ’’ .
The screening criteria of the up- and down-regulated subpathways can be
changed through the parameters pcut = ’’ and topcut =
’’. The commands are as follows:
###Run the function
Opdrugresult<-Optimaldrugs(ExpData=GEP,Label=Slabel,DrugSPESC=DrugSPESCMatrix,
CentralityScore=CentralityScoreResult,nperm=1000,topcut=10,
pcut=0.01,weight=FALSE)
## Drug DES pvalue FDR
## 421 valproic acid -0.4621059 0.000 0.0000000
## 1296 xylometazoline -0.9143394 0.000 0.0000000
## 1299 monobenzone -0.9189044 0.000 0.0000000
## 1305 Prestwick-559 -0.9795918 0.000 0.0000000
## 1309 gefitinib -0.9989259 0.000 0.0000000
## 1292 methotrexate -0.8941998 0.001 0.2181667
## 1158 trichostatin A -0.6055317 0.002 0.2908889
## 1182 fluphenazine -0.6356069 0.002 0.2908889
## 1302 4,5-dianilinophthalimide -0.9481740 0.002 0.2908889
## 1275 phthalylsulfathiazole -0.8378088 0.004 0.3490667
The function plotSPW
used to plot
a subpathway network structure graph. The user just needs to input an
interest subpathway id such as “00020_4”.
The commands are as follows:
###load depend package
library(igraph)
###plot network graph of the subpathway "00020_4"
plotSPW("00020_4")
The function getMolecularFm
can
obtain a chemical molecular formula of the drug or compound. Then users
could visualize the molecular formula through function “plot”.
The commands are as follows:
The function Disease2SPheatmap
plots a heat map of the subpathways that are regulated by disease. The
input is the result of function CalCentralityScore
, disease
gene expression profile and sample class in the expression profile. We
map subpathways to the disease gene expression through ssgsea to get a
subpathway abundance matrix. Then we visualize the matrix by heatmap.
Users could change the threshold that is used to screen significant
subpathways through the param pcut.
The commands are as follows:
## Warning: package 'pheatmap' was built under R version 4.3.2
###Run the function
Disease2SPheatmap(CentralityScore=CentralityScoreResult,ExpData=GEP,Label=Slabel,pcut=0.05,
bk=c(-2,2),cluster.rows=FALSE,cluster.cols=FALSE,
show.rownames=TRUE,show.colnames=FALSE,
col=c("navy","firebrick3"),cell.width=NA,
cell.height=NA,scale="row",fontsize=7,
fontsize.row=9,fontsize.col=10)
The function Drug2SPheatmap
plots
heatmaps of the subpathways that are regulated by drugs. The function
input is a character which is drug name, disease gene expression profile
and sample class in the expression profile. We map subpathways to the
disease gene expression through ssgsea to get a subpathway abundance
matrix. Then we visualize the matrix by heatmap. Users could change the
threshold that is used to screen significant subpathways through the
param pcut. The result of this function is a list including all
heatmaps.
The commands are as follows:
###Load depend package
library(GSVA)
library(pheatmap)
###Run the function
Drug2SPheatmap(drugname="methotrexate_HL60_6_8.8e-06",
DrugSPPvalue=DrugSPPvalueMatrix,ExpData=GEP,
Label=Slabel,pcut=0.05,bk=c(-2,2),cluster.rows=FALSE,
cluster.cols=FALSE,show.rownames=TRUE,
show.colnames=FALSE,col=c("navy","firebrick3"),
cell.width=NA,cell.height=NA,scale="row",
fontsize=6,fontsize.row=9,fontsize.col=10)