Towards Confidence Estimates in Cascade Networks using the SelectBoost Package

Frédéric Bertrand and Myriam Maumy-Bertrand

Université de Strasbourg and CNRSIRMA, labex IRMIA
frederic.bertrand@utt.fr

2022-11-29

#Introduction

Aims of the vignette

Extending results from the Cascade package: reverse engineering with selectboost to compute confidence indices for a fitted model. We first fit a model to a cascade network using the Cascade package inference function then we compute confidence indices for the inferred links using the Selecboost algorithm.

If you are a Linux/Unix or a Macos user, you can install a version of SelectBoost with support for doMC from github with:

devtools::install_github("fbertran/SelectBoost", ref = "doMC")

References

Code

Code to reproduce the datasets saved with the package and some the figures of the article Bertrand et al. (2020), Bioinformatics. https://doi.org/10.1093/bioinformatics/btaa855

Data simulation

library(Cascade)

We define the F array for the simulations.

T<-4
F<-array(0,c(T-1,T-1,T*(T-1)/2))

for(i in 1:(T*(T-1)/2)){diag(F[,,i])<-1}
F[,,2]<-F[,,2]*0.2
F[2,1,2]<-1
F[3,2,2]<-1
F[,,4]<-F[,,2]*0.3
F[3,1,4]<-1
F[,,5]<-F[,,2]

We set the seed to make the results reproducible

set.seed(1)
Net<-Cascade::network_random(
  nb=100,
  time_label=rep(1:4,each=25),
  exp=1,
  init=1,
  regul=round(rexp(100,1))+1,
  min_expr=0.1,
  max_expr=2,
  casc.level=0.4
)
Net@F<-F

We simulate gene expression according to the network Net

M <- Cascade::gene_expr_simulation(
  network=Net,
  time_label=rep(1:4,each=25),
  subject=5,
  level_peak=200)

Network inference

We infer the new network.

Net_inf_C <- Cascade::inference(M,cv.subjects=TRUE)
#> We are at step :  1
#> The convergence of the network is (L1 norm) : 0.0068
#> We are at step :  2
#> The convergence of the network is (L1 norm) : 0.00121
#> We are at step :  3
#> The convergence of the network is (L1 norm) : 0.00096

Heatmap of the coefficients of the Omega matrix of the network. Run the code to get the graph.

stats::heatmap(Net_inf_C@network, Rowv = NA, Colv = NA, scale="none", revC=TRUE)
Fab_inf_C <- Net_inf_C@F

Compute the confidence indices for the inference

library(SelectBoost)
set.seed(1)

By default the crossvalidation is made subjectwise according to a leave one out scheme and the resampling analysis is made at the .95 c0 level. To pass CRAN tests, use.parallel = FALSE is required. Set use.parallel = TRUE and select the number of cores using ncores = 4.

net_pct_selected <- selectboost(M, Fab_inf_C, use.parallel = FALSE)
#> 
#> We are at peak :  2
#> .........................
#> We are at peak :  3
#> .........................
#> We are at peak :  4
#> .........................
net_pct_selected_.5 <- selectboost(M, Fab_inf_C, c0value = .5, use.parallel = FALSE)
#> 
#> We are at peak :  2
#> .........................
#> We are at peak :  3
#> .........................
#> We are at peak :  4
#> .........................
net_pct_selected_thr <- selectboost(M, Fab_inf_C, group = group_func_1, use.parallel = FALSE)
#> 
#> We are at peak :  2
#> .........................
#> We are at peak :  3
#> .........................
#> We are at peak :  4
#> .........................

Use cv.subject=FALSE to use default crossvalidation

net_pct_selected_cv <- selectboost(M, Fab_inf_C, cv.subject=FALSE, use.parallel = FALSE)
#> 
#> We are at peak :  2
#> .........................
#> We are at peak :  3
#> .........................
#> We are at peak :  4
#> .........................

Analysis of the confidence indices

Use plot to display the result of the confidence analysis.

plot(net_pct_selected)

Run the code to plot the other results.

plot(net_pct_selected_.5)
plot(net_pct_selected_thr)
plot(net_pct_selected_cv)

Run the code to plot the remaning graphs.

Distribution of non-zero (absolute value > 1e-5) coefficients

hist(Net_inf_C@network[abs(Net_inf_C@network)>1e-5])

Plot of confidence at .95 resampling level versus coefficient value for non-zero (absolute value > 1e-5) coefficients

plot(Net_inf_C@network[abs(Net_inf_C@network)>1e-5],net_pct_selected@network.confidence[abs(Net_inf_C@network)>1e-5])
hist(net_pct_selected@network.confidence[abs(Net_inf_C@network)>1e-5])

Plot of confidence at .5 resampling level versus coefficient value for non-zero (absolute value > 1e-5) coefficients

plot(Net_inf_C@network[abs(Net_inf_C@network)>1e-5],net_pct_selected_.5@network.confidence[abs(Net_inf_C@network)>1e-5])
hist(net_pct_selected_.5@network.confidence[abs(Net_inf_C@network)>1e-5])

Plot of confidence at .95 resamling level with groups created by thresholding the correlation matrix versus coefficient value for non-zero (absolute value > 1e-5) coefficients.

plot(Net_inf_C@network[abs(Net_inf_C@network)>1e-5],net_pct_selected_thr@network.confidence[abs(Net_inf_C@network)>1e-5])
hist(net_pct_selected_thr@network.confidence[abs(Net_inf_C@network)>1e-5])

Plot of confidence at .95 resampling level versus coefficient value for non-zero (absolute value > 1e-5) coefficients using standard cross-validation.

plot(Net_inf_C@network[abs(Net_inf_C@network)>1e-5],net_pct_selected_cv@network.confidence[abs(Net_inf_C@network)>1e-5])
hist(net_pct_selected_cv@network.confidence[abs(Net_inf_C@network)>1e-5])

Further improvements

For further recommandations on the choice of the c0 parameter, for instance as a quantile, please consult the SI of Bertrand et al. 2020.