This package is an implementation of the additive profile clustering (ADPROCLUS) method in R. It can be used to obtain overlapping clustering models for object-by-variable data matrices. It also contains the low dimensional ADPROCLUS method, which achieves a simultaneous dimension reduction when searching for overlapping clusters. This can be used when the object-by-variable data contains a very large number of variables.
You can install the latest version from CRAN:
install.packages("adproclus")
Or install the development version of ADPROCLUS from GitHub with:
# install.packages("devtools")
::install_github("henry-heppe/adproclus") devtools
This is a basic example which shows you how to use the regular ADPROCLUS and the low dimensional ADPROCLUS:
library("adproclus")
# import data
<- adproclus::CGdata
our_data
# perform ADPROCLUS to get an overlapping clustering model
<- adproclus(data = our_data, nclusters = 2)
model_full
# perform low dimensional ADPROCLUS to get an overlapping clustering model in terms of a smaller number of variables
<- adproclus_low_dim(data = our_data, nclusters = 3, ncomponents = 2) model_lowdim
To select the number of clusters (and the number of components in the low dimensional case) the package provides two model selection functions.
library("adproclus")
# estimate multiple ADPROCLUS models
<- mselect_adproclus(data = CGdata, min_nclusters = 2, max_nclusters = 4)
models
# estimate multiple low dimensional ADPROCLUS models
<- mselect_adproclus_low_dim(data = CGdata, min_nclusters = 2, max_nclusters = 4, min_ncomponents = 1, max_ncomponents = 3)
models_lowdim
# visualize models as a scree plot
plot_scree_adpc(models)
# visualize the low dimensional models as a scree plot
plot_scree_adpc(models_lowdim)
# select the best full dimensional model
<- select_by_CHull(models)
best_model
# select a the conditionally optimal low dimensional model for each number of clusters
<- select_by_CHull(models_lowdim)
best_models_lowdim
# visualize the preselected set of low dimensional models
plot_scree_adpc_preselected(best_models_lowdim)
The package also provides functionality to obtain membership matrices, which the algorithm can start the alternating least squares procedure on. There are three different possibilities to obtain such matrices: random, semi-random and rational (see respective function documentation for details).
library("adproclus")
# import data
<- adproclus::CGdata
our_data # Obtaining a membership matrix were the entries are randomly assigned values of 0 or 1
<- get_random(our_data, 3)
start_allocation1 # Obtaining a membership matrix based on a profile matrix consisting of randomly selected rows of the data
<- get_semirandom(our_data, 3)
start_allocation2 # Obtaining a user-defined rational start profile matrix (here the first 3 rows of the data)
<- get_rational(our_data, our_data[1:3, ])$A start_allocation3