| Type: | Package |
| Title: | Correlation Heatmaps |
| Version: | 0.3.2 |
| Date: | 2026-02-04 |
| Author: | Vidal Fey [aut, cre], Henri Sara [aut] |
| Maintainer: | Vidal Fey <vidal.fey@gmail.com> |
| Description: | Create correlation heatmaps from a numeric matrix. Ensembl Gene ID row names can be converted to Gene Symbols using, e.g., BioMart. Optionally, data can be clustered and filtered by correlation, tree cutting and/or number of missing values. Genes of interest can be highlighted in the plot and correlation significance be indicated by asterisks encoding corresponding P-Values. Plot dimensions and label measures are adjusted automatically by default. The plot features rely on the heatmap.n2() function in the 'heatmapFlex' package. |
| Depends: | Biobase |
| Imports: | WGCNA, heatmapFlex, convertid (≥ 0.2.1), methods, graphics, grDevices, rappdirs |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Suggests: | rmarkdown, knitr, BiocManager, org.Hs.eg.db, org.Mm.eg.db |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-02-04 15:10:11 UTC; fsvife |
| Repository: | CRAN |
| Date/Publication: | 2026-02-09 13:10:16 UTC |
Draw Correlation Heatmaps
Description
Create correlation heatmaps from a numeric matrix. Ensembl Gene ID row names can be converted to Gene Symbols using, e.g., BioMart. Optionally, data can be clustered and filtered by correlation, tree cutting and/or number of missing values. Genes of interest can be highlighted in the plot and correlation significance be indicated by asterisks encoding corresponding P-Values. Plot dimensions and label measures are adjusted automatically by default. The plot features rely on the heatmap.n2() function in the 'heatmapFlex' package.
Details
| Package: | coreheat |
| Type: | Package |
| Initial version: | 0.1.0 |
| Created: | 2016-08-11 |
| License: | GPL-3 |
The main function to be called by end users is cormap2 which is wrapper performing all necessary steps to create a heatmap.
|
Author(s)
Vidal Fey <vidal.fey@gmail.com>, Henri Sara <henri.sara@gmail.com> Maintainer: Vidal Fey <vidal.fey@gmail.com>
Cluster a correlation matrix and return the sorted matrix for plotting.
Description
Helper function to cluster the correlation matrix and return the sorted matrix for plotting.
Usage
clust_cormap(
cormat,
na.frac = 0.1,
distfn = function(cm) (1 - cm),
method = "complete",
cor.cluster = 1,
cor.window = NULL,
cor.thr = 0.8,
cor.mar = 0.05,
cut.thr = 0.9,
cut.size = 5,
list.output = FALSE,
verbose = FALSE
)
Arguments
cormat |
( |
na.frac |
( |
distfn |
( |
method |
( |
cor.cluster |
( |
cor.window |
( |
cor.thr |
( |
cor.mar |
( |
cut.thr |
( |
cut.size |
( |
list.output |
( |
verbose |
( |
Value
A correlation matrix or a list or the matrix and other values needed for plotting.
Draw correlation maps from large datasets.
Description
cormap2() generates pair-wise correlations from an input ExpressionSet object, a data.frame or a
numerical matrix. With the default options it also produces a heatmap.
Usage
cormap2(
x,
cormat = NULL,
lab = NULL,
convert = TRUE,
biomart = FALSE,
biom.data.set = "hsapiens_gene_ensembl",
biom.mart = "ensembl",
host = "https://www.ensembl.org",
biom.filter = "ensembl_gene_id",
biom.attributes = c("ensembl_gene_id", "hgnc_symbol"),
biom.cache = rappdirs::user_cache_dir("biomaRt"),
use.cache = TRUE,
cluster_correlations = TRUE,
main = "",
postfix = NULL,
cex = NULL,
na.frac = 0.1,
cor.cluster = 1,
cor.window = NULL,
cor.thr = NULL,
cor.mar = 0.5,
cut.thr = NULL,
cut.size = 5,
autoadj = TRUE,
labelheight = NULL,
labelwidth = NULL,
add.sig = FALSE,
genes2highl = NULL,
order.list = TRUE,
doPlot = TRUE,
updateProgress = NULL,
verbose = FALSE
)
Arguments
x |
( |
cormat |
( |
lab |
( |
convert |
( |
biomart |
( |
biom.data.set |
|
biom.mart |
|
host |
|
biom.filter |
|
biom.attributes |
|
biom.cache |
|
use.cache |
( |
cluster_correlations |
( |
main |
( |
postfix |
( |
cex |
( |
na.frac |
( |
cor.cluster |
( |
cor.window |
( |
cor.thr |
( |
cor.mar |
( |
cut.thr |
( |
cut.size |
( |
autoadj |
( |
labelheight |
( |
labelwidth |
( |
add.sig |
( |
genes2highl |
( |
order.list |
( |
doPlot |
( |
updateProgress |
( |
verbose |
( |
Details
P-Values are calculated from the t-test value of the correlation coefficient: t = r x sqrt(n-2) / sqrt(1-r^2),
where r is the correlation coefficient, n is the number of samples with no missing values for each gene (row-wise
ncol(eset) minus the number of columns that have an NA). P-Values are the calculated using pt and
corrected account for the two-tailed nature of the test, i.e., the possibility of positive as well as negative correlation.
The approach to calculate correlation significance was adopted from Miles, J., & Banyard, P. (2007) on
"Calculating the exact significance of a Pearson correlation in MS Excel".
The asterisks encode significance as follows:
| P < 0.05: * | |
| P < 0.01: ** | |
| P < 0.001: *** |
The label measures (labelheight, labelwidth and cex) are adjusted automatically by default
with argument autoadj=TRUE and have default values which are hard coded into the helper function
heatmap.cor. The values calculated by the helper function plotAdjust can be overridden by setting
any of those arguments to a valid numeric or lcm(numeric) value.
Value
Invisibly returns the correlation matrix, though the function is mainly called for its side-effect of producing
a heatmap (if doPlot = TRUE which is the default).
References
Miles, J., & Banyard, P. (2007). Understanding and using statistics in psychology: A practical introduction. Sage Publications Ltd. https://psycnet.apa.org/record/2007-06525-000.
See Also
Examples
# 1. Generate a random 10x10 matrix with two distinct sets and plot it with
# default settings without ID conversion since the IDs are made up:
set.seed(1234)
mat <- matrix(c(rnorm(100, mean = 1), rnorm(100, mean = -1)), nrow = 20)
rownames(mat) <- paste0("gene-", 1:20)
colnames(mat) <- paste0(c("A", "B"), rep(1:5, 2))
cormap2(mat, convert=FALSE, main="Random matrix")
# 2. Use a real-world dataset from TCGA (see README file in inst/extdata directory).
# Package 'convertid' is used to convert Ensembl Gene IDs to HGNC Symbols
## Read data and prepare input data frame
fl <- system.file("extdata", "PrCaTCGASample.txt", package = "coreheat", mustWork = TRUE)
dat0 <- read.delim(fl, stringsAsFactors=FALSE)
dat1 <- data.frame(dat0[, grep("TCGA", names(dat0))], row.names=dat0$ensembl_gene_id)
cormap2(dat1, main="TCGA data frame + ID conversion")
# 3. Use separately supplied IDs with a matrix created from the data frame of the
# previous example and highlight genes of interest
dat2 <- as.matrix(dat0[, grep("TCGA", names(dat0))])
sym <- dat0$hgnc_symbol
cormap2(dat1, convert=FALSE, lab=sym, genes2highl=c("GNAS","NCOR1","AR", "ATM"),
main="TCGA matrix + custom labels")
# 4. Use an ExpressionSet object and add significance asterisks
## For simplicity reasons we create the ExpressionSet from a matrix created
## from the data frame in the second example
expr <- Biobase::ExpressionSet(as.matrix(dat1))
cormap2(expr, add.sig=TRUE, main="TCGA ExpressionSet object + ID conversion")
# More examples can be found in the vignette.
Automatically split clusters based on noise level and hierarchy
Description
cormat_filt splits (cuts) the dendrogram at a given threshold dividing it into larger or
smaller "sub-clusters". Correlation P-Values (see eset_cor) are converted to represent
significance as a sub-cluster-wise signal metric used for filtering. Optionally, up to 3 plots are produced,
the third one being a filtered heatmap based on significance and three height cutting.
Usage
cormap_filt(
x,
na.frac = 0.1,
method = "ward.D",
do.abs = TRUE,
main = "correlation map",
postfix = NULL,
p.thr = 0.01,
cex = 0.2,
cex.clust = cex,
cex.filt = cex,
cut.thr = NULL,
cor.thr = NULL,
cor.cluster = 1,
cor.window = NULL,
do.plots = c("dend", "full.heat", "filt.heat"),
genes2highl = NULL,
order.list = TRUE,
convert = TRUE,
biomart = FALSE,
biom.data.set = "hsapiens_gene_ensembl",
biom.mart = "ensembl",
host = "https://www.ensembl.org",
biom.filter = "ensembl_gene_id",
biom.attributes = c("ensembl_gene_id", "hgnc_symbol"),
biom.cache = rappdirs::user_cache_dir("biomaRt"),
use.cache = TRUE,
add.sig = FALSE,
verbose = FALSE
)
Arguments
x |
( |
na.frac |
( |
method |
( |
do.abs |
( |
main |
( |
postfix |
( |
p.thr |
( |
cex |
( |
cex.clust |
( |
cex.filt |
( |
cut.thr |
( |
cor.thr |
( |
cor.cluster |
( |
cor.window |
( |
do.plots |
( |
genes2highl |
( |
order.list |
( |
convert |
( |
biomart |
( |
biom.data.set |
|
biom.mart |
|
host |
|
biom.filter |
|
biom.attributes |
|
biom.cache |
|
use.cache |
( |
add.sig |
( |
verbose |
( |
Details
P-Values are calculated from the t-test value of the correlation coefficient: t = r x sqrt(n-2) / sqrt(1-r^2),
where r is the correlation coefficient, n is the number of samples with no missing values for each gene (row-wise
ncol(eset) minus the number of columns that have an NA). P-Values are then calculated using pt and
corrected account for the two-tailed nature of the test, i.e., the possibility of positive as well as negative correlation.
The approach to calculate correlation significance was adopted from Miles, J., & Banyard, P. (2007) on
"Calculating the exact significance of a Pearson correlation in MS Excel".
To obtain a suitable metric for isolating significant sub-clusters, P-Values are represented as -log10(median(pval))
where pval is the median of the parallel maximum of all P-Values belonging to the sub-cluster and
1e-38 to avoid values of zero (0).
Value
A list. If the dendrogram is being cut, i.e., cut.thr is not NULL, a list of
clusters: the list of cluster labels from lower component of the cut.dendrogram output which
is list with the branches obtained from cutting the tree |
|
| filt: the index of the cluster labels passing the signal metrics threshold | |
| filt_cluster: the list of the filtered cluster labels | |
| h: the cut threshold | |
| p.thr: the P-Value threshold for filtering sub-clusters | |
| metric: the signal metrics for all sub-clusters | |
| cormat: the clustered (ordered) correlation matrix | |
hclust: a list of hierarchical clustering metrics (output of hclust) |
|
| pvalues: the correlation P-Value matrix |
If no tree cutting is applied, a list of
| cormat: the clustered (ordered) correlation matrix | |
hclust: a list of hierarchical clustering metrics (output of hclust) |
|
| pvalues: the correlation P-Value matrix |
Helper function to calculate the correlation matrix.
Description
Helper function to calculate the correlation matrix.
Usage
eset_cor(x, with.pvalues = TRUE, order.list = TRUE, verbose = FALSE)
Arguments
x |
( |
with.pvalues |
( |
order.list |
( |
verbose |
( |
Details
P-Values are calculated from the t-test value of the correlation coefficient: t = r x sqrt(n-2) / sqrt(1-r^2),
where r is the correlation coefficient, n is the number of samples with no missing values for each gene (row-wise
ncol(eset) minus the number of columns that have an NA). P-Values are then calculated using pt and
corrected account for the two-tailed nature of the test, i.e., the possibility of positive as well as negative correlation.
The approach to calculate correlation significance was adopted from Miles, J., & Banyard, P. (2007) on
"Calculating the exact significance of a Pearson correlation in MS Excel".
Value
A correlation matrix or a list with three slots: the correlation matrix, the number of samples with no
missing value for each gene and the P-Values matrix.
References
Miles, J., & Banyard, P. (2007). Understanding and using statistics in psychology: A practical introduction. Sage Publications Ltd. https://psycnet.apa.org/record/2007-06525-000.