This package provides infrastructure to make text datasets available within R, even when they are too large to store within an R package or are licensed in such a way that prevents them from being included in OSS-licensed packages.
Do you want to add a new dataset to the textdata package?
prefix_*.R
in the R/
folder, where *
is the name of the dataset. Supported
prefixes include
dataset_
lexicon_
download_*()
,
process_*()
and dataset_*()
.
download_*()
function should take 1 argument named
folder_path
. It has 2 tasks, first it should check if the
file is already downloaded. If it is already downloaded it should return
invisible()
. If the file isn’t at the path it should
download the file to said path.process_*()
function should take 2 arguments,
folder_path
and name_path
.
folder_path
denotes the the path to the file returned by
download_*
and name_path
is the path to where
the polished data should live. Main point of process_*()
is
to turn the downloaded file into a .rds file containing a tidy
tibble.dataset_*()
function should wrap the
load_dataset()
.process_*()
function to the named list
process_functions
in the file process_functions.R.download_*()
function to the named list
download_functions
in the file download_functions.R.print_info
list in the info.R file.dataset_*.R
to the @include tags in
download_functions.R
.README.Rmd
._pkgdown.yml
.NEWS.md file
.What are the guidelines for adding datasets?
word
instead of words
for column
names.For datasets that comes with a testing and training dataset. Let the
user pick which one to retrieve with a split
argument
similar to how dataset_ag_news()
is doing.