This document explains PCA, clustering, LFDA and MDS related plotting using {ggplot2}
and {ggfortify}
.
{ggfortify}
let {ggplot2}
know how to interpret PCA objects. After loading {ggfortify}
, you can use ggplot2::autoplot
function for stats::prcomp
and stats::princomp
objects.
library(ggfortify)
df <- iris[1:4]
pca_res <- prcomp(df, scale. = TRUE)
autoplot(pca_res)
PCA result should only contains numeric values. If you want to colorize by non-numeric values which original data has, pass original data using data
keyword and then specify column name by colour
keyword. Use help(autoplot.prcomp)
(or help(autoplot.*)
for any other objects) to check available options.
autoplot(pca_res, data = iris, colour = 'Species')
Passing label = TRUE
draws each data label using rownames
autoplot(pca_res, data = iris, colour = 'Species', label = TRUE, label.size = 3)
Passing shape = FALSE
makes plot without points. In this case, label
is turned on unless otherwise specified.
autoplot(pca_res, data = iris, colour = 'Species', shape = FALSE, label.size = 3)
Passing loadings = TRUE
draws eigenvectors.
autoplot(pca_res, data = iris, colour = 'Species', loadings = TRUE)
You can attach eigenvector labels and change some options.
autoplot(pca_res, data = iris, colour = 'Species',
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3)
By default, each component are scaled as the same as standard biplot
. You can disable the scaling by specifying scale = 0
autoplot(pca_res, scale = 0)
{ggfortify}
supports stats::factanal
object as the same manner as PCAs. Available opitons are the same as PCAs.
Important You must specify scores
option when calling factanal
to calcurate sores (default scores = NULL
). Otherwise, plotting will fail.
d.factanal <- factanal(state.x77, factors = 3, scores = 'regression')
autoplot(d.factanal, data = state.x77, colour = 'Income')
autoplot(d.factanal, label = TRUE, label.size = 3,
loadings = TRUE, loadings.label = TRUE, loadings.label.size = 3)
{ggfortify}
supports stats::kmeans
class. You must explicitly pass original data to autoplot
function via data
keyword. Because kmeans
object doesn’t store original data. The result will be automatically colorized by categorized cluster.
set.seed(1)
autoplot(kmeans(USArrests, 3), data = USArrests)
autoplot(kmeans(USArrests, 3), data = USArrests, label = TRUE, label.size = 3)
{ggfortify}
supports cluster::clara
, cluster::fanny
, cluster::pam
as well as cluster::silhouette
classes.
Because these instances should contains original data in its property, there is no need to pass original data explicitly.
library(cluster)
autoplot(clara(iris[-5], 3))
Specifying frame = TRUE
in autoplot
for stats::kmeans
and cluster::*
draws convex for each cluster.
autoplot(fanny(iris[-5], 3), frame = TRUE)
If you want probability ellipse, {ggplot2}
1.0.0 or later is required. Specify whatever supported in ggplot2::stat_ellipse
’s type
keyword via frame.type
option.
autoplot(pam(iris[-5], 3), frame = TRUE, frame.type = 'norm')
If you want a Silhouette plot, pass a Silhouette object to autoplot
function.
autoplot(silhouette(pam(iris[-5], 3L)))
For more information on Silhouette plots and how they can be used, see base R example, scikit-learn example and original paper.
{lfda}
package{lfda}
package supports a set of Local Fisher Discriminant Analysis methods. You can use autoplot
to plot the analysis result as the same manner as PCA.
library(lfda)
# Local Fisher Discriminant Analysis (LFDA)
model <- lfda(iris[-5], iris[, 5], r = 3, metric="plain")
autoplot(model, data = iris, frame = TRUE, frame.colour = 'Species')
# Semi-supervised Local Fisher Discriminant Analysis (SELF)
model <- self(iris[-5], iris[, 5], beta = 0.1, r = 3, metric="plain")
autoplot(model, data = iris, frame = TRUE, frame.colour = 'Species')
Even though MDS functions returns matrix
or list
(not specific class), {ggfortify}
can infer background class from list
attribute and perform autoplot
.
NOTE Inference from matrix
is not supported.
NOTE {ggfortify}
can plot stats::dist
instance as heatmap.
autoplot(eurodist)
stats::cmdscale
performs Classical MDS and returns point coodinates as matrix
, thus you can not use autoplot
in this case. However, either eig = TRUE
, add = True
or x.ret = True
is specified, stats::cmdscale
return list
instead of matrix
. In these cases, {ggfortify}
can infer how to plot it via autoplot
. Refer to help(cmdscale)
to check what these options are.
autoplot(cmdscale(eurodist, eig = TRUE))
Specify label = TRUE
to plot labels.
autoplot(cmdscale(eurodist, eig = TRUE), label = TRUE, label.size = 3)
MASS::isoMDS
and MASS::sammon
perform Non-metric MDS and return list
which contains point coordinates. Thus, autoplot
can be used.
NOTE On background, autoplot.matrix
is called to plot MDS. See help(autoplot.matrix)
to check available options.
library(MASS)
autoplot(isoMDS(eurodist), colour = 'orange', size = 4, shape = 3)
## initial value 7.505733
## final value 7.505688
## converged
Passing shape = FALSE
makes plot without points. In this case, label
is turned on unless otherwise specified.
autoplot(sammon(eurodist), shape = FALSE, label.colour = 'blue', label.size = 3)
## Initial stress : 0.01705
## stress after 10 iters: 0.00951, magic = 0.500
## stress after 20 iters: 0.00941, magic = 0.500