Please check the latest news (change log) and keep this package updated.
BERT_download()
connects to the
Internet, while all the other functions run in an offline way.BERT_info()
.add.tokens
and add.method
parameters
for BERT_vocab()
and FMAT_run()
: An
experimental functionality to add new tokens (e.g.,
out-of-vocabulary words, compound words, or even phrases) as [MASK]
options. Validation is still needed for this novel practice (one of my
ongoing projects), so currently please only use at your own risk,
waiting until the publication of my validation work.BERT_download()
now import local
model files only, without automatically downloading models. Users must
first use BERT_download()
to download models.FMAT_load()
: Better to use
FMAT_run()
directly.BERT_vocab()
and ICC_models()
.summary.fmat()
, FMAT_query()
, and
FMAT_run()
(significantly faster because now it can
simultaneously estimate all [MASK] options for each unique
query sentence, with running time only depending on the number of unique
queries but not on the number of [MASK] options).reticulate
package version ≥ 1.36.1,
then FMAT
should be updated to ≥ 2024.4. Otherwise,
out-of-vocabulary [MASK] words may not be identified and marked. Now
FMAT_run()
directly uses model vocabulary and token ID to
match [MASK] words. To check if a [MASK] word is in the model
vocabulary, please use BERT_vocab()
.BERT_download()
(downloading models to local
cache folder “%USERPROFILE%/.cache/huggingface”) to differentiate from
FMAT_load()
(loading saved models from local cache). But
indeed FMAT_load()
can also download models
silently if they have not been downloaded.gpu
parameter (see Guidance
for GPU Acceleration) in FMAT_run()
to allow for
specifying an NVIDIA GPU device on which the fill-mask pipeline will be
allocated. GPU roughly performs 3x faster than CPU for the fill-mask
pipeline. By default, FMAT_run()
would automatically detect
and use any available GPU with an installed CUDA-supported Python
torch
package (if not, it would use CPU).FMAT_run()
.BERT_download()
,
FMAT_load()
, and FMAT_run()
.parallel
in FMAT_run()
:
FMAT_run(model.names, data, gpu=TRUE)
is the fastest.progress
in
FMAT_run()
.