epitab
provides functionality for building contingency tables with a variety of additional configurations. It was initially designed for use in epidemiology, as an extension to the Epi::stat.table
function. However, by identifying core components of a descriptive table, it is flexible enough to be used in a variety of disciplines and situations. This vignette provides an overview of the types of tables that can be built using epitab
.
For demonstration purposes, a simulated data set representing an observational study of a disease will be used. This fictitious disease primarily affects elderly people, and not every patient receives first-line treatment. The disease itself comes in two variants: A and B.
set.seed(17)
treat <- data.frame(age=abs(rnorm(100, 60, 20)),
sex=factor(sample(c("M", "F"), 100, replace=T)),
variant=factor(sample(c("A", "B"), 100, replace=T)),
treated=factor(sample(c("Yes", "No"), 100, replace=T), levels=c("Yes", "No")))
treat$agebin <- cut(treat$age, breaks=c(0, 40, 60, 80, 9999),labels=c("0-40", "41-60", "61-80", "80+"))
age | sex | variant | treated | agebin |
---|---|---|---|---|
39.69983 | F | B | Yes | 0-40 |
58.40727 | F | A | No | 41-60 |
55.34026 | M | B | No | 41-60 |
43.65464 | F | B | No | 41-60 |
75.44182 | F | B | Yes | 61-80 |
56.68776 | M | B | Yes | 41-60 |
Contingency tables are useful tools for exploratory analysis of a data set, and highlight relationships between one or more independent variables and (typically one) outcome of interest. The example code below shows how to build a basic contingency table to view how treatment varies by age group and sex.
Both the independent and outcome variables are passed in through lists, where the column names must be quoted strings (thereby allowing for these tables to be used in automated scripts). The list entry labels are used to provide the column and row labels. The crosstab_funcs
argument specifies what summary measures should be calculated for each covariate / outcome combination. The freq
function calculates the frequency of each of these cells, and (optionally) provides the proportion in parentheses.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## ---------------------------------------------------------------
## | | | | |
## |Total |100 |45 (100) |55 (100) |
## | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |
## |41-60 |34 (34) |12 (26.7) |22 (40) |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |
## | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |
## |M |51 (51) |22 (48.9) |29 (52.7) |
Using this standard contingency table as a starting point, there are several ways to customise the table. The presence of the overall frequency column is controlled by the marginal
argument. There are also options to freq
that specify the formatting of the cross-tabulated frequencies (see ?freq
for more details).
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq(proportion = "none")),
marginal=FALSE,
data=treat)
## | |Treated | |
## | |Yes |No |
## -----------------------------------------
## | | | |
## |Total |45 |55 |
## | | | |
## Age |0-40 |5 |8 |
## |41-60 |12 |22 |
## |61-80 |21 |18 |
## |80+ |7 |7 |
## | | | |
## Sex |F |23 |26 |
## |M |22 |29 |
Note that multiple outcomes can be selected, although it still results in a 2-way contingency table between all the covariates and the outcomes independently. It is not currently possible to produce a 3-way contingency table.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated", "Variant"="variant"),
crosstab_funcs=list(freq()),
data=treat)
## | | |Treated | |Variant | |
## | |All |Yes |No |A |B |
## ---------------------------------------------------------------------------------------------
## | | | | | | |
## |Total |100 |45 (100) |55 (100) |55 (100) |45 (100) |
## | | | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |7 (12.7) |6 (13.3) |
## |41-60 |34 (34) |12 (26.7) |22 (40) |14 (25.5) |20 (44.4) |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |24 (43.6) |15 (33.3) |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |10 (18.2) |4 (8.9) |
## | | | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |29 (52.7) |20 (44.4) |
## |M |51 (51) |22 (48.9) |29 (52.7) |26 (47.3) |25 (55.6) |
Additional statistics can be added to these contingency tables in two ways. Column-wise measures act on each outcome in turn without regard to the covariates, while row-wise measures are those that are calculated for every level of each independent variable.
It is often the case that in addition to the categorical variables included in the contingency table, there are continuous attributes that we are interested in. The col_funcs
argument to contingency_table
calculates summary measures for each outcome and can be used for this purpose.
The example below shows how to calculate mean age across treatment types, using the provided function summary_mean
, to which the name of the continuous variable of interest is passed as a string.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age")),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## --------------------------------------------------------------------
## | | | | |
## |Total |100 |45 (100) |55 (100) |
## | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |
## |41-60 |34 (34) |12 (26.7) |22 (40) |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |
## | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |
## |M |51 (51) |22 (48.9) |29 (52.7) |
## | | | | |
## Mean age | |60.05 |62.2 |58.28 |
As with crosstab_funcs
, multiple summary values can be passed to col_funcs
. The example below shows the use of the other column-wise function provided with epitab
: summary_median
.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Median age"=summary_median("age")),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## ----------------------------------------------------------------------
## | | | | |
## |Total |100 |45 (100) |55 (100) |
## | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |
## |41-60 |34 (34) |12 (26.7) |22 (40) |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |
## | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |
## |M |51 (51) |22 (48.9) |29 (52.7) |
## | | | | |
## Mean age | |60.05 |62.2 |58.28 |
## | | | | |
## Median age | |60.94 |62.79 |58.41 |
Another common addition is to display the coefficients of a regression model that relates the independent variables with an outcome (although not necessarily the same outcome displayed in the contingency table). For example, we may be interested to see how treatment varies by age group by looking at the odds ratios (ORs) of a univariate logistic regression. This functionality is provided by the row_funcs
argument to contingency_table
, which accepts a named list of functions that meet the correct requirements. The two functions provided with this package are odds_ratio
and hazard_ratio
, used to display coefficients resulting from logistic regression and Cox regression respectively.
The example below shows how to specify that the odds ratios should be calculated in addition to the cross-tabulated frequencies. The only required argument to odds_ratio
is the name of the outcome variable.
contingency_table(list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
data=treat,
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated'))
)
## | | |Treated | | |
## | |All |Yes |No |OR |
## ---------------------------------------------------------------------------------------
## | | | | | |
## |Total |100 |45 (100) |55 (100) | |
## | | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |1 |
## |41-60 |34 (34) |12 (26.7) |22 (40) |1.15 (0.29 - 4.25) |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |0.54 (0.14 - 1.90) |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |0.63 (0.13 - 2.87) |
## | | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |1 |
## |M |51 (51) |22 (48.9) |29 (52.7) |1.17 (0.53 - 2.58) |
Additional arguments to odds_ratio
allow the model to adjust for every other covariate included in independents
, specify the largest group as the baseline, and select whether to include confidence intervals. Note that multiple functions can be provided to row_funcs
. While the table below may not fit on the page of this document, it fits neatly within the standard R terminal output. Strategies for neatly displaying tables are discussed later on.
contingency_table(list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
data=treat,
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Adj OR"=odds_ratio('treated', adjusted=TRUE,
relevel_baseline=TRUE))
)
## | | |Treated | | | |
## | |All |Yes |No |OR |Adj OR |
## ---------------------------------------------------------------------------------------------------------------
## | | | | | | |
## |Total |100 |45 (100) |55 (100) | | |
## | | | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |1.87 (0.53 - 7.15) |1.84 (0.51 - 7.16) |
## |41-60 |34 (34) |12 (26.7) |22 (40) |2.14 (0.84 - 5.61) |2.12 (0.83 - 5.60) |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |1 |1 |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |1.17 (0.34 - 4.03) |1.17 (0.34 - 4.03) |
## | | | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |0.86 (0.39 - 1.89) |0.95 (0.42 - 2.14) |
## |M |51 (51) |22 (48.9) |29 (52.7) |1 |1 |
Another use case in epidemiology is when survival is the outcome of interest. Such data is more appropriately modelled using Cox regression, which can be specified with the hazard_ratio
function. This requires the outcome to be specified as a string detailing a Surv
object, for example hazard_ratio("Surv(time, status)")
. See the help page ?hazard_ratio
for further details.
There is no limit to the number of column-wise and row-wise functions that can be supplied, although too many can hinder readability and detract from the purpose of the table.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Median age"=summary_median("age")),
row_funcs=list("OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Adj OR"=odds_ratio('treated', adjusted=TRUE,
relevel_baseline=TRUE)),
data=treat)
## | | |Treated | | | |
## | |All |Yes |No |OR |Adj OR |
## ----------------------------------------------------------------------------------------------------------------------
## | | | | | | |
## |Total |100 |45 (100) |55 (100) | | |
## | | | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |1.87 (0.53 - 7.15) |1.84 (0.51 - 7.16) |
## |41-60 |34 (34) |12 (26.7) |22 (40) |2.14 (0.84 - 5.61) |2.12 (0.83 - 5.60) |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |1 |1 |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |1.17 (0.34 - 4.03) |1.17 (0.34 - 4.03) |
## | | | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |0.86 (0.39 - 1.89) |0.95 (0.42 - 2.14) |
## |M |51 (51) |22 (48.9) |29 (52.7) |1 |1 |
## | | | | | | |
## Mean age | |60.05 |62.2 |58.28 | | |
## | | | | | | |
## Median age | |60.94 |62.79 |58.41 | | |
This flexibility of epitab
allows for either simple summary tables that are used to highlight a trend within the data, or more complex reference tables that hold a large amount of summary statistics.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated", "Disease variant"="variant"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Median age"=summary_median("age")),
row_funcs=list("Treatment OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Disease variant OR"=odds_ratio('variant',
relevel_baseline=TRUE)),
data=treat)
## | | |Treated | |Disease variant | | | |
## | |All |Yes |No |A |B |Treatment OR |Disease variant OR |
## ----------------------------------------------------------------------------------------------------------------------------------------------------------
## | | | | | | | | |
## |Total |100 |45 (100) |55 (100) |55 (100) |45 (100) | | |
## | | | | | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |7 (12.7) |6 (13.3) |1.87 (0.53 - 7.15) |1.37 (0.38 - 4.92) |
## |41-60 |34 (34) |12 (26.7) |22 (40) |14 (25.5) |20 (44.4) |2.14 (0.84 - 5.61) |2.29 (0.90 - 5.96) |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |24 (43.6) |15 (33.3) |1 |1 |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |10 (18.2) |4 (8.9) |1.17 (0.34 - 4.03) |0.64 (0.15 - 2.30) |
## | | | | | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |29 (52.7) |20 (44.4) |0.86 (0.39 - 1.89) |0.72 (0.32 - 1.58) |
## |M |51 (51) |22 (48.9) |29 (52.7) |26 (47.3) |25 (55.6) |1 |1 |
## | | | | | | | | |
## Mean age | |60.05 |62.2 |58.28 |62.27 |57.32 | | |
## | | | | | | | | |
## Median age | |60.94 |62.79 |58.41 |63.76 |56.45 | | |
contingency_table
can even be used when there is no cross-tabulation, for example as a means of displaying regression coefficients.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
row_funcs=list("OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Adj OR"=odds_ratio('treated', adjusted=TRUE,
relevel_baseline=TRUE)),
data=treat)
## | |OR |Adj OR |
## --------------------------------------------------------------------
## | | | |
## Age |0-40 |1.87 (0.53 - 7.15) |1.84 (0.51 - 7.16) |
## |41-60 | | |
## |61-80 |1 |1 |
## |80+ |1.17 (0.34 - 4.03) |1.17 (0.34 - 4.03) |
## | | | |
## Sex |F |0.86 (0.39 - 1.89) |0.95 (0.42 - 2.14) |
## |M |1 |1 |
The default print
method of these contingency tables is designed for a standard wide R console, where the entire table fits width-wise. However, for situations where a table is being produced for distribution or publication of any type, greater attention to detail and appearance is required. epitab
provides several options for exporting clean-looking tables.
neat_table
to HTML and PDFThe neat_table
function provided in epitab
builds a cleanly formatted table for output to HMTL or LaTeX, using knitr::kable
and the kableExtra
package. The output of neat_table
is a kable
object and so can be passed to kableExtra::kable_styling()
, allowing for the specification of various cosmetic settings. See the help files for both neat_table
and kableExtra::kable_styling
for further details.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated'),
"Adj OR"=odds_ratio('treated', adjusted=TRUE)),
col_funcs=list("Mean age"=summary_mean("age")),
data=treat) %>%
neat_table() %>%
kableExtra::kable_styling(bootstrap_options=c("striped", "hover"),
full_width=FALSE)
All | Yes | No | OR | Adj OR | |
---|---|---|---|---|---|
Total | 100 | 45 (100) | 55 (100) | ||
Age | |||||
0-40 | 13 (13) | 5 (11.1) | 8 (14.5) | 1 | 1 |
41-60 | 34 (34) | 12 (26.7) | 22 (40) | 1.15 (0.29 - 4.25) | 1.15 (0.29 - 4.31) |
61-80 | 39 (39) | 21 (46.7) | 18 (32.7) | 0.54 (0.14 - 1.90) | 0.54 (0.14 - 1.96) |
80+ | 14 (14) | 7 (15.6) | 7 (12.7) | 0.63 (0.13 - 2.87) | 0.63 (0.13 - 2.96) |
Sex | |||||
F | 49 (49) | 23 (51.1) | 26 (47.3) | 1 | 1 |
M | 51 (51) | 22 (48.9) | 29 (52.7) | 1.17 (0.53 - 2.58) | 1.06 (0.47 - 2.39) |
Mean age | 60.05 | 62.2 | 58.28 |
Due to the vignette markdown theme, these styling changes won’t appear in this HTML. The screenshot below shows how the table appears when using the above code in the default Rmarkdown template.
For outputting to PDF documents using LaTeX, the same neat_table
function can be used, but now the latex
output format must be specified. Also it is highly recommended to use the booktabs
argument to produce far cleaner looking tables.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated'),
"Adj OR"=odds_ratio('treated', adjusted=TRUE)),
col_funcs=list("Mean age"=summary_mean("age")),
data=treat) %>%
neat_table('latex', booktabs=TRUE) %>%
kableExtra::kable_styling(font_size=8)
The above call will display the table below using the default Rmarkdown template.
kable
If full control of the table appearance is required, then the raw character matrix is provided as the mat
attribute of the output of contingency_table
. It can be used in conjunction with knitr::kable
and kableExtra
. NB: The default value for format
in kable
is pandoc, which does not work well with epitab
, try html or markdown instead.
contingency_table(list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated'),
"Adj OR"=odds_ratio('treated', adjusted=TRUE)),
data=treat)$mat %>%
knitr::kable("html") %>%
kableExtra::kable_styling(bootstrap_options="striped")
Treated | ||||||
All | Yes | No | OR | Adj OR | ||
Total | 100 | 45 (100) | 55 (100) | |||
Age | 0-40 | 13 (13) | 5 (11.1) | 8 (14.5) | 1 | 1 |
41-60 | 34 (34) | 12 (26.7) | 22 (40) | 1.15 (0.29 - 4.25) | 1.15 (0.29 - 4.31) | |
61-80 | 39 (39) | 21 (46.7) | 18 (32.7) | 0.54 (0.14 - 1.90) | 0.54 (0.14 - 1.96) | |
80+ | 14 (14) | 7 (15.6) | 7 (12.7) | 0.63 (0.13 - 2.87) | 0.63 (0.13 - 2.96) | |
Sex | F | 49 (49) | 23 (51.1) | 26 (47.3) | 1 | 1 |
M | 51 (51) | 22 (48.9) | 29 (52.7) | 1.17 (0.53 - 2.58) | 1.06 (0.47 - 2.39) |
Again note that the stylistic changes made above will not display in the vignette you are currently reading due to the vignette template, but they will appear in your Rmarkdown output as shown below.
Since Word is a proprietary format, it is challenging to directly embed tables into documents. The most convenient method to export a contingency table into Word involves the following steps:
tab <- contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Median age"=summary_median("age")),
row_funcs=list("OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Adj OR"=odds_ratio('treated', adjusted=TRUE,
relevel_baseline=TRUE)),
data=treat)
write.table(tab$mat, "mytable.csv", row.names=FALSE, col.names=FALSE, sep=',')
In the above examples, the summary functions used to build up the table in crosstab_funcs
, row_funcs
, and col_funcs
have been provided by epitab
. However, for greater flexibility, any correctly parametrised function can be supplied instead. This section details the appropriate form for each of these 3 arguments.
The functions passed in to crosstab_functions
are run for each combination of outcome and independent variable level.
Arguments:
data
: A subset of the full data, the strata of individuals with the current independent variable level.
outcome_level
: A string providing the current outcome level.outcome_name
: A string providing the current outcome variable.independent_level
: A string providing the current independent level.independent_name
: A string providing the current independent variable.The function must return a vector of length one, representing the statistic for this covariate-level / outcome-level pair.
The example function below calculates the proportion of each treatment type per covariate level (rather than also displaying the counts as freq
does).
proportion <- function(data, outcome_level=NULL, outcome_name=NULL, independent_level=NULL, independent_name=NULL) {
if (!is.null(outcome_level) && !is.null(outcome_name))
data <- data[data[[outcome_name]] == outcome_level, ]
count <- if (!is.null(independent_level) && !is.null(independent_name)) {
sum(data[[independent_name]] == independent_level)
} else {
nrow(data)
}
proportion <- count / nrow(data)
sprintf("%0.1f%%", proportion*100)
}
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(proportion),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## --------------------------------------------------------
## | | | | |
## |Total |100 |100.0% |100.0% |
## | | | | |
## Age |0-40 |13.0% |11.1% |14.5% |
## |41-60 |34.0% |26.7% |40.0% |
## |61-80 |39.0% |46.7% |32.7% |
## |80+ |14.0% |15.6% |12.7% |
## | | | | |
## Sex |F |49.0% |51.1% |47.3% |
## |M |51.0% |48.9% |52.7% |
The column-wise functions provided with epitab
are summary_mean
and summary_median
, and are used to investigate relationships between the outcome variables that aren’t necessarily associated with the categorical covariates.
Args:
data
: The full data set that was passed to contingency_table
.
outcome_level
: The current level of the outcome, as a string.outcome_name
: The current outcome, as a string.Returns:
The function must return a single value, representing the statistic for this outcome level.
The example function below extends summary_mean
by adding the standard deviation in parentheses. It is hard-coded to work for the continuous variable age
in the dummy data set.
meanage_sd <- function(data, outcome_level=NULL, outcome_name=NULL) {
if (!is.null(outcome_level) && !is.null(outcome_name))
data <- data[data[[outcome_name]] == outcome_level, ]
mean <- round(mean(data[['age']]), 2)
sd <- round(sd(data[['age']]), 2)
paste0(mean, " (", sd, ")")
}
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Mean age (sd)"=meanage_sd),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## --------------------------------------------------------------------------------------
## | | | | |
## |Total |100 |45 (100) |55 (100) |
## | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |
## |41-60 |34 (34) |12 (26.7) |22 (40) |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |
## | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |
## |M |51 (51) |22 (48.9) |29 (52.7) |
## | | | | |
## Mean age | |60.05 |62.2 |58.28 |
## | | | | |
## Mean age (sd) | |60.05 (20.13) |62.2 (20.44) |58.28 (19.89) |
Row-wise functions are used to estimate summary measures for the independent variables outside of the contingency table. This can be useful for providing a summary statistic that is not necessarily related to the outcome variables. This example will describe the case where we wish to run a linear regression on a continuous outcome, in particular, this example will regress on continuous age. This is not a particularly helpful analysis, since one of the covariates is age group, but it will serve as an example. These functions are run for each independent variable and must be parametrised as follows:
Args:
data
: The data set originally passed into contingency_table
.
var
: The current independent variable, as a string.all_vars
: All independent variables as passed into independents
. This is provided to allow regression models to adjust for other covariates.The function must return a vector with length equal to the number of levels of var
.
lr <- function(data, var=NULL, all_vars=NULL) {
if (is.null(var) || is.null(all_vars)) {
return("")
}
levs <- levels(data[[var]])
form <- as.formula(paste('age ~', var))
mod <- lm(form, data)
coefs <- c(coef(mod), 1) # Add baseline as 1
# coefficients are named with <variable><level>
labels <- paste0(var, levs)
# set baseline name in coefficients vector
names(coefs)[length(coefs)] <- labels[1]
round(coefs[labels], 3)
}
contingency_table(list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
data=treat,
crosstab_funcs=list(freq()),
row_funcs=list("Regression on age"=lr)
)
## | | |Treated | | |
## | |All |Yes |No |Regression on age |
## --------------------------------------------------------------------------------------
## | | | | | |
## |Total |100 |45 (100) |55 (100) | |
## | | | | | |
## Age |0-40 |13 (13) |5 (11.1) |8 (14.5) |1 |
## |41-60 |34 (34) |12 (26.7) |22 (40) |23.49 |
## |61-80 |39 (39) |21 (46.7) |18 (32.7) |42.379 |
## |80+ |14 (14) |7 (15.6) |7 (12.7) |65.6 |
## | | | | | |
## Sex |F |49 (49) |23 (51.1) |26 (47.3) |1 |
## |M |51 (51) |22 (48.9) |29 (52.7) |-6.563 |