pipeR provides various styles of function chaining methods:
Each of them represents a distinct pipeline model but they share almost a common set of features. A value can be piped to the next expression
.
) in the expressionThe syntax is designed to make the pipeline more readable and friendly to a wide variety of operations.
pipeR Tutorial is a highly recommended complete guide to pipeR.
This document is also translated into 日本語 (by @hoxo_m).
Install the latest development version from GitHub:
::install_github("renkun-ken/pipeR") devtools
Install from CRAN:
install.packages("pipeR")
The following code is an example written in traditional approach:
It basically performs bootstrap on mpg
values in
built-in dataset mtcars
and plots its density function
estimated by Gaussian kernel.
plot(density(sample(mtcars$mpg, size = 10000, replace = TRUE),
kernel = "gaussian"), col = "red", main="density of mpg (bootstrap)")
The code is deeply nested and can be hard to read and maintain. In
the following examples, the traditional code is rewritten by Pipe
operator, Pipe()
function and pipeline()
function, respectively.
$mpg %>>%
mtcarssample(size = 10000, replace = TRUE) %>>%
density(kernel = "gaussian") %>>%
plot(col = "red", main = "density of mpg (bootstrap)")
Pipe()
)Pipe(mtcars$mpg)$
sample(size = 10000, replace = TRUE)$
density(kernel = "gaussian")$
plot(col = "red", main = "density of mpg (bootstrap)")
pipeline(mtcars$mpg,
sample(size = 10000, replace = TRUE),
density(kernel = "gaussian"),
plot(col = "red", main = "density of mpg (bootstrap)"))
pipeline({
$mpg
mtcarssample(size = 10000, replace = TRUE)
density(kernel = "gaussian")
plot(col = "red", main = "density of mpg (bootstrap)")
})
%>>%
Pipe operator %>>%
basically pipes the left-hand
side value forward to the right-hand side expression which is evaluated
according to its syntax.
Many R functions are pipe-friendly: they take some data by the first argument and transform it in a certain way. This arrangement allows operations to be streamlined by pipes, that is, one data source can be put to the first argument of a function, get transformed, and put to the first argument of the next function. In this way, a chain of commands are connected, and it is called a pipeline.
On the right-hand side of %>>%
, whenever a
function name or call is supplied, the left-hand side value will always
be put to the first unnamed argument to that function.
rnorm(100) %>>%
plot
rnorm(100) %>>%
plot(col="red")
Sometimes the value on the left is needed at multiple places. One can
use .
to represent it anywhere in the function call.
rnorm(100) %>>%
plot(col="red", main=length(.))
There are situations where one calls a function in a namespace with
::
. In this case, the call must end up with
()
.
rnorm(100) %>>%
::median()
stats
rnorm(100) %>>%
::plot(col = "red") graphics
.
in an
expressionNot all functions are pipe-friendly in every case: You may find some
functions do not take your data produced by a pipeline as the first
argument. In this case, you can enclose your expression by
{}
or ()
so that %>>%
will
use .
to represent the value on the left.
%>>%
mtcars lm(mpg ~ cyl + wt, data = .) } {
%>>%
mtcars lm(mpg ~ cyl + wt, data = .) ) (
Sometimes, it may look confusing to use .
to represent
the value being piped. For example,
%>>%
mtcars lm(mpg ~ ., data = .)) (
Although it works perfectly, it may look ambiguous if .
has several meanings in one line of code.
%>>%
accepts lambda expression to direct its
piping behavior. Lambda expression is characterized by a formula
enclosed within ()
, for example, (x ~ f(x))
.
It contains a user-defined symbol to represent the value being piped and
the expression to be evaluated.
%>>%
mtcars ~ lm(mpg ~ ., data = df)) (df
%>>%
mtcars subset(select = c(mpg, wt, cyl)) %>>%
~ plot(mpg ~ ., data = x)) (x
In a pipeline, one may be interested not only in the final outcome
but sometimes also in intermediate results. To print, plot or save the
intermediate results, it must be a side-effect to avoid breaking the
mainstream pipeline. For example, calling plot()
to draw
scatter plot returns NULL
, and if one directly calls
plot()
in the middle of a pipeline, it would break the
pipeline by changing the subsequent input to NULL
.
One-sided formula that starts with ~
indicates that the
right-hand side expression will only be evaluated for its side-effect,
its value will be ignored, and the input value will be returned
instead.
%>>%
mtcars subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
~ cat("rows:",nrow(.),"\n")) %>>% # cat() returns NULL
( summary
%>>%
mtcars subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
~ plot(mpg ~ wt, data = .)) %>>% # plot() returns NULL
(lm(mpg ~ wt, data = .)) %>>%
(summary()
With ~
, side-effect operations can be easily
distinguished from mainstream pipeline.
An easier way to print the intermediate value it to use
(? expr)
syntax like asking question.
%>>%
mtcars ncol(.)) %>>%
(? summary
In addition to printing and plotting, one may need to save an intermediate value to the environment by assigning the value to a variable (symbol).
If one needs to assign the value to a symbol, just insert a step like
(~ symbol)
, then the input value of that step will be
assigned to symbol
in the current environment.
%>>%
mtcars lm(formula = mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
( summary
If the input value is not directly to be saved but after some
transformation, then one can use =
, <-
, or
more natural ->
to specify a lambda expression to tell
what to be saved (thanks @yanlinlin82 for suggestion).
%>>%
mtcars ~ summ = summary(.)) %>>% # side-effect assignment
(lm(formula = mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
( summary
%>>%
mtcars ~ summary(.) -> summ) %>>%
(
%>>%
mtcars ~ summ <- summary(.)) %>>% (
An easier way to saving intermediate value that is to be further
piped is to use (symbol = expression)
syntax:
%>>%
mtcars ~ summ = summary(.)) %>>% # side-effect assignment
(lm_mtcars = lm(formula = mpg ~ wt + cyl, data = .)) %>>% # continue piping
( summary
or (expression -> symbol)
syntax:
%>>%
mtcars ~ summary(.) -> summ) %>>% # side-effect assignment
(lm(formula = mpg ~ wt + cyl, data = .) -> lm_mtcars) %>>% # continue piping
( summary
x %>>% (y)
means extracting the element named
y
from object x
where y
must be a
valid symbol name and x
can be a vector, list, environment
or anything else for which [[]]
is defined, or S4
object.
%>>%
mtcars lm(mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
(%>>%
summary (r.squared)
library(dplyr)
%>>%
mtcars filter(mpg <= mean(mpg)) %>>%
select(mpg, wt, cyl) %>>%
~ plot(.)) %>>%
(model = lm(mpg ~ wt + cyl, data = .)) %>>%
(summ = summary(.)) %>>%
( (coefficients)
library(ggvis)
%>>%
mtcars ggvis(~mpg, ~wt) %>>%
layer_points()
library(rlist)
1:100 %>>%
list.group(. %% 3) %>>%
list.mapv(g ~ mean(g))
Pipe()
Pipe()
creates a Pipe object that supports light-weight
chaining without any external operator. Typically, start with
Pipe()
and end with $value
or []
to extract the final value of the Pipe.
Pipe object provides an internal function .(...)
that
work exactly in the same way with x %>>% (...)
, and
it has more features than %>>%
.
NOTE:
.()
does not support assignment with=
but supports~
,<-
and->
.
Pipe(rnorm(1000))$
density(kernel = "cosine")$
plot(col = "blue")
Pipe(mtcars)$
$
.(mpg)summary()
Pipe(mtcars)$
~ summary(.) -> summ)$
.(lm(formula = mpg ~ wt + cyl)$
summary()$
.(coefficients)
<- Pipe(mtcars)
pmtcars c("mpg","wt")]$
pmtcars[lm(formula = mpg ~ wt)$
summary()
"mpg"]]$mean() pmtcars[[
<- Pipe(list(a=1,b=2))
plist $a <- 0
plist$b <- NULL plist
Pipe(mtcars)$
ncol(.))$
.(? ~ plot(mpg ~ ., data = .))$ # side effect: plot
.(lm(formula = mpg ~ .)$
~ lm_mtcars)$ # side effect: assign
.(summary()$
Pipe(mtcars)$
filter(mpg >= mean(mpg))$
select(mpg, wt, cyl)$
lm(formula = mpg ~ wt + cyl)$
summary()$
$
.(coefficients) value
Pipe(mtcars)$
ggvis(~ mpg, ~ wt)$
layer_points()
Pipe(1:100)$
list.group(. %% 3)$
list.mapv(g ~ mean(g))$
value
pipeline()
pipeline()
provides argument-based and expression-based
pipeline evaluation mechanisms. Its behavior depends on how its
arguments are supplied. If only the first argument is supplied, it
expects an expression enclosed in {}
in which each line
represents a pipeline step. If, instead, multiple arguments are
supplied, it regards each argument as a pipeline step. For all pipeline
steps, the expressions will be transformed to be connected by
%>>%
so that they behave exactly the same.
One notable difference is that in pipeline()
’s argument
or expression, the special symbols to perform specially defined pipeline
tasks (e.g. side-effect) does not need to be enclosed within
()
because no operator priority issues arise as they do in
using %>>%
.
pipeline({
mtcarslm(formula = mpg ~ cyl + wt)
~ lmodel
summary$r.squared
? .
coef })
Thanks @hoxo_m for the idea presented in this post.
This package is under MIT License.