The R package for creating and installing data packages that follow the Open Knowledge Foundation’s Data Package Protocol.
dpmr has three core functions:
datapackage_init
: initialises a new data package
from an R data frame and (optionally) a meta data list.
datapackage_install
: installs a data package either
stored locally or remotely, e.g. on GitHub.
datapackage_info:
reads a data package’s metadata
(stored in its datapackage.json
file) into the R Console and (optionally) as a list.
To initiate a barebones data package in the current working directory
called My_Data_Package
use:
# Create fake data
<- B <- C <- sample(1:20, size = 20, replace = TRUE)
A <- sort(rep('a', 20))
ID <- data.frame(ID, A, B, C)
Data
datapackage_init(df = Data, package_name = 'My_Data_Package')
This will create a data package with barebones metadata in a datapackage.json file. You can then edit this by hand.
Alternatively, you can also create a list with the metadata in R and have this included with the data package:
<- list(name = 'My_Data_Package',
meta_list title = 'A fake data package',
last_updated = Sys.Date(),
version = '0.1',
license = data.frame(type = 'PDDL-1.0',
url = 'http://opendatacommons.org/licenses/pddl/'),
sources = data.frame(name = 'Fake',
web = 'No URL, its fake.'))
datapackage_init(df = Data, meta = meta_list)
Note if you don’t include the resources
fields in your
metadata list, then they will automatically be added. These fields
identify the data files’ paths and data schema
.
To load a data package called gdp stored in the current working directory use:
<- datapackage_install(path = 'gdp/') gdp_data
You can install a package stored remotely using its URL. In this example we directly download the gdp data package from GitHub using the URL for its zip file:
<- 'https://github.com/datasets/gdp/archive/master.zip'
URL <- datapackage_install(path = URL) gdp_data
Use datapackage_info
to read a data package’s metadata
into R:
# Print information when working directory is a data package
datapackage_info()
datapackage_update
for updating a data package’s data and
metadata.
Specify data variable descriptions in meta list.
Load inline data from the datapackage.json file.
Load data from a GitHub repo using GitHub usernames and repos.
Licensed under GPL-3