textTrainN()
including subsets
sampling (new: default change from random
to
subsets
), use_same_penalty_mixture
(new:default change from FALSE
to TRUE
) and
std_err
(new output).textTrainPlot()
textPredict()
functionality.textTopics()
textTopics()
trains a BERTopic model with different
modules and returns the model, data, and topic_document distributions
based on c-td-idftextTopicsTest()
can perform multiple tests
(correlation, t-test, regression) between a BERTopic model from
textTopics()
and datatextTopicsWordcloud()
can plot word clouds of topics
tested with textTopicsTest()
textTopicsTree()
prints out a tree structure of the
hierarchical topic structuretextEmbed()
is now fully embedding one column at the
time; and reducing word_types for each column. This can break some code;
and produce different results in plots where word_types are based on
several embedded columns.textTrainN()
and textTrainNPlot()
evaluates prediction accuracy across number of cases.textTrainRegression()
and
textTrainRandomForest
now takes tibble as input in
strata.textTrainRegression()
textPredictTest()
can handle auc
textEmbed()
is faster (thanks to faster handling of
aggregating layers)sort
parameter in
textEmbedRawLayers()
.Possibility to use GPU for MacOS M1 and M2 chip using device = “mps”
in textEmbed()
textFineTune()
as an experimental function is
implemented max_length
implemented in
textTranslate()
textEmbedReduce()
implementedtextEmbed(decontextualize=TRUE)
, which gave
error.textSimialirtyTest()
for version 1.0 because
it needs more evaluations.model
, so
that layers
= -2 works in textEmbed()
.set_verbosity
.sorting_xs_and_x_append
from Dim to Dim0
when renaming x_appended variables.first
to append_first
and made it
an option in textTrainRegression()
and
textTrainRandomForest()
.textEmbed()
layers = 11:12
is now
second_to_last
.textEmbedRawLayers
default is now
second_to_last
.textEmbedLayerAggregation()
layers = 11:12
is now layers = "all"
.textEmbed()
and textEmbedRawLayers()
x
is now called texts
.textEmbedLayerAggregation()
now uses
layers = "all"
,
aggregation_from_layers_to_tokens
,
aggregation_from_tokens_to_texts
.textZeroShot()
is implemented.textDistanceNorm()
and
textDistanceMatrix()
textDistance()
can compute cosine
distance
.textModelLayers()
provides N layers for a given
modelmax_token_to_sentence
in textEmbed()
aggregate_layers
is now called
aggregation_from_layers_to_tokens
.aggregate_tokens
is now called
aggregation_from_tokens_to_texts
.
single_word_embeddings
is now called
word_types_embeddings
textEmbedLayersOutput()
is now called
textEmbedRawLayers()
textDimName()
textEmbed()
: dim_name
=
TRUE
textEmbed()
:
single_context_embeddings
= TRUE
textEmbed()
: device = “gpu”explore_words
in
textPlot()
x_append_target
in textPredict()
functiontextClassify()
, textGeneration()
,
textNER()
, textSum()
, textQA()
,
and textTranslate()
.x_add
to x_append
across
functionsset_seed
to language analysis tasksx'
in training and
predictiontextPredict
does not take word_embeddings
and x_append
(not new_data
)textClassify()
(under development)textGeneration()
(under development)textNER()
(under development)textSum()
(under development)textQA()
(under development)textTranslate()
(under development)textSentiment()
, from huggingface
transformers models.textEmbed()
, textTrainRegression()
,
textTrainRandomForest()
and
textProjection()
.dim_names
to set unique dimension names in
textEmbed()
and textEmbedStatic()
.textPreictAll()
function that can take several models,
word embeddings, and variables as input to provide multiple
outputs.textTrain()
functions with x_append
.textPredict
related functions are located in its own
filetext_version
numbertextEmbedLayersOutput
and textEmbed
can
provide single_context_embeddings
return_tokens
option from textEmbed (since it
is only relevant for textEmbedLayersOutput)$single_we
when
decontexts
is FALSE
.Logistic
regression is default for classification in
textTrain.model_max_length
in
textEmbed()
.textModels()
show downloaded models.textModelsRemove()
deletes specified models.textSimilarityTest()
when
uneven number of cases are tested.textDistance()
function with distance
measures.textSimilarity()
.textSimilarity()
in
textSimilarityTest()
, textProjection()
and
textCentrality()
for plotting.textTrainRegression()
concatenates word embeddings when provided with a list of several word
embeddings.word_embeddings_4$singlewords_we
.textCentrality()
, words to be plotted are selected
with word_data1_all$extremes_all_x >= 1
(rather than
==1
).textSimilarityMatrix()
computes semantic similarity
among all combinations in a given word embedding.textDescriptives()
gets options to remove NA and
compute total scores.textDescriptives()
textrpp_initiate()
tokenization
is made with NLTK
from
python.textWordPredictions()
(which has a trial period/not fully developed and might be removed in
future versions); p-values are not yet implemented.textPlot()
for objects from both
textProjection()
and
textWordPredictions()
textrpp_initiate()
runs automatically in
library(text)
when default environment exitstextSimilarityTest()
.stringr
to stringi
(and
removed tokenizer) as imported packagetextrpp_install()
installs a conda
environment with text required python packages.textrpp_install_virtualenv()
install a virtual
environment with text required python packages.textrpp_initialize()
initializes installed
environment.textrpp_uninstall()
uninstalls conda
environment.textEmbed()
and textEmbedLayersOutput()
support the use of GPU using the device
setting.remove_words
makes it possible to remove specific words
from textProjectionPlot()
textProjetion()
and textProjetionPlot()
it now possible to add points of the aggregated word embeddings in the
plottextProjetion()
it now possible to manually add
words to the plot in order to explore them in the word embedding
space.textProjetion()
it is possible to add color or
remove words that are more frequent on the opposite “side” of its dot
product projection.textProjection()
with
split == quartile
, the comparison distribution is now based
on the quartile data (rather than the data for mean)textEmbed()
with
decontexts=TRUE
.textSimilarityTest()
is not giving error when using
method = unpaired, with unequal number of participants in each
group.textPredictTest()
function to significance test
correlations of different models. 0.9.11This version is now on CRAN. ### New Features - Adding option to
deselect the step_centre
and step_scale
in
training. - Cross-validation method in
textTrainRegression()
and
textTrainRandomForrest()
have two options
cv_folds
and validation_split
. (0.9.02) -
Better handling of NA
in step_naomit
in
training. - DistilBert
model works (0.9.03)
textProjectionPlot()
plots words extreme in more than
just one feature (i.e., words are now plotted that satisfy, for example,
both plot_n_word_extreme
and
plot_n_word_frequency
). (0.9.01)textTrainRegression()
and
textTrainRandomForest()
also have function that select the
max evaluation measure results (before only minimum was selected all the
time, which, e.g., was correct for rmse but not for r) (0.9.02)id_nr
in training and predict by using
workflows (0.9.02).