2. Guided Partial Least Squares (guided-PLS)

Koki Tsuyuzaki

Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research
k.t.the-answer@hotmail.co.jp

2025-08-25

Introduction

In this vignette, we consider a novel supervised dimensional reduction method guided partial least squares (guided-PLS).

Test data is available from toyModel.

library("guidedPLS")
data <- guidedPLS::toyModel("Easy")
str(data, 2)
## List of 8
##  $ X1      : int [1:100, 1:300] 86 101 95 106 113 85 88 103 106 84 ...
##  $ X2      : int [1:200, 1:150] 106 81 91 101 91 105 111 81 113 105 ...
##  $ Y1      : int [1:100, 1:50] 101 77 77 87 101 89 111 113 101 112 ...
##  $ Y1_dummy: num [1:100, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
##  $ Y2      : int [1:200, 1:50] 107 81 102 90 84 106 97 90 88 115 ...
##  $ Y2_dummy: num [1:200, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
##  $ col1    : chr [1:100] "#66C2A5" "#66C2A5" "#66C2A5" "#66C2A5" ...
##  $ col2    : chr [1:200] "#66C2A5" "#66C2A5" "#66C2A5" "#66C2A5" ...

You will see that there are three blocks in the data matrix as follows.

suppressMessages(library("fields"))
layout(c(1,2,3))
image.plot(data$Y1_dummy, main="Y1 (Dummy)", legend.mar=8)
image.plot(data$Y1, main="Y1", legend.mar=8)
image.plot(data$X1, main="X1", legend.mar=8)

Guided Partial Least Squares (guided-PLS)

Here, suppose that we have two data matrices \(X_1\) (\(N \times M\)) and \(X_2\) (\(S \times T\)), and the row vectors of them are assumed to be centered. Since these two matrices have no common row or column, integration of them is not trivial. Such a data structure is called “diagonal” and known as a barrier to omics data integration (Argelaguet 2021).

Here is a simpler way to set up the problem; suppose that we have another set of matrices \(Y_1\) (\(M \times I\)) and \(Y_2\) (\(T \times I\)), which are the label matrices for \(X_1\) and \(X_2\), respectively.

In guided-PLS, the data matrices \(X_1\) and \(X_2\) are projected into lower dimension via \(Y_1\) and \(Y_2\), and then PLS-SVD are performed against the \(Y_{1} X_{1}\) and \(Y_{2} X_{2}\) as follows:

\[ \max_{W_{1},W_{2}} \mathrm{tr} \left( W_{1}^{T} X_{1}^{T} Y_{1}^{T} Y_{2} X_{2} W_{2} \right)\ \mathrm{s.t.}\ W_{1}^{T}W_{1} = W_{2}^{T}W_{2} = I_{K} \]

Basic Usage

guidedPLS is performed as follows.

out <- guidedPLS(X1=data$X1, X2=data$X2, Y1=data$Y1, Y2=data$Y2, k=2)
plot(rbind(out$scoreX1, out$scoreX2), col=c(data$col1, data$col2),
pch=c(rep(2, length=nrow(out$scoreX1)), rep(3, length=nrow(out$scoreX2))))
legend("bottomleft", legend=c("XY1", "XY2"), pch=c(2,3))

Session Information

## R version 3.6.3 (2020-02-29)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Rocky Linux 9.5 (Blue Onyx)
## 
## Matrix products: default
## BLAS:   /home/koki/miniconda3/lib/libblas.so.3.9.0
## LAPACK: /home/koki/miniconda3/lib/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] fields_14.1       viridis_0.6.2     viridisLite_0.4.1 spam_2.9-1       
## [5] guidedPLS_1.1.0  
## 
## loaded via a namespace (and not attached):
##  [1] highr_0.10       bslib_0.3.1      compiler_3.6.3   pillar_1.7.0    
##  [5] jquerylib_0.1.4  tools_3.6.3      digest_0.6.31    dotCall64_1.0-2 
##  [9] jsonlite_1.8.4   evaluate_0.20    lifecycle_1.0.1  tibble_3.1.2    
## [13] gtable_0.3.0     lattice_0.20-45  pkgconfig_2.0.3  rlang_0.4.11    
## [17] Matrix_1.5-4     DBI_1.1.3        yaml_2.3.7       xfun_0.38       
## [21] fastmap_1.1.1    gridExtra_2.3    dplyr_1.0.6      knitr_1.42      
## [25] maps_3.4.1       generics_0.1.3   sass_0.4.0       vctrs_0.3.8     
## [29] tidyselect_1.1.1 grid_3.6.3       glue_1.4.2       R6_2.5.1        
## [33] fansi_1.0.4      rmarkdown_2.11   irlba_2.3.5.1    purrr_0.3.4     
## [37] ggplot2_3.3.5    magrittr_2.0.3   scales_1.1.1     htmltools_0.5.5 
## [41] ellipsis_0.3.2   assertthat_0.2.1 colorspace_2.1-0 utf8_1.2.3      
## [45] munsell_0.5.0    crayon_1.5.2

References

Argelaguet, et al., R. 2021. “Computational Principles and Challenges in Single-Cell Data Integration.” Nature Biotechnology 39: 1202–15.