The benthos
-package provides functions for analysing
benthic data sets. To use the benthos
package, some basic
knowledge of programming in R (R Core Team, 2017; www.r-project.org) is
assumed.
The functions in the benthos
-package have been designed
to integrate seamlessly with those of the dplyr
-package
(Wickham et al., 2017). The dplyr
-package
implements a grammar of data manipulation to make data analysis more
efficient and clear.
The benthos
-package is designed to use the forward-pipe
operator (%>%
) provided by the
magrittr
-package (Milton Bache & Wickham, 2014). This
operator can be used for chaining multiple data operations together.
This reduces the need for temporary variables or nested function calls
and leads to cleaner and more readable code. Consult the references
below and the corresponding package vignettes for more information.
The benthos
-package follows the same philosophy as the
dplyr
package: in stead of providing complicated functions
that can do many tasks (‘Swiss army knife’-functions), we provide a set
of ‘small functions that each do one thing well’ (Wickham &
Francois, 2017; ‘Introduction to dplyr
vignette’). As a
consequence, you will not find functions in this package which perform a
complete analysis, rather it provides basic building blocks that you can
use to build your own functions and applications.
The benthos
-package can be attached by means of
library(benthos)
In this vignette, we also attach the dplyr
,
tidyr
, readr
and ggplot2
packages
for data manipulation and visualization.
library(dplyr)
library(tidyr)
library(readr)
library(ggplot2)
In the sections below, we will illustrate the
benthos
-package by means of the Oosterschelde marine
benthos data set. This data set ships with the
benthos
-package and can be loaded by typing:
data(oosterschelde)
The first 10 records of these data set are given below:
oosterschelde# A tibble: 4,269 × 8
OBJECTID SAMPLEID DATE ID HABITAT AREA TAXON COUNT<chr> <int> <date> <int> <chr> <dbl> <chr> <int>
1 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Aphelochaeta marioni 1
2 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Crangon crangon 1
3 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Nephtys hombergii 4
4 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Oligochaeta 5
5 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Pygospio elegans 12
6 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Scoloplos armiger 1
7 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Spio martinensis 1
8 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Spiophanes bombyx 8
9 nl89_oostsde 14957 2010-09-06 2 Polyhaline-Subtidal 0.0157 Corophium arenarium 1
10 nl89_oostsde 14957 2010-09-06 2 Polyhaline-Subtidal 0.0157 Nephtys cirrosa 1
# … with 4,259 more rows
Type
?oosterschelde
to see the documentation of this data set.
Data preprocessing is an important first step. In this section we will demonstrate some preprocessing steps.
As a simple preprocessing step, we will only consider samples (stored in rows) taken in August and September. These can be selected as follows:
<- oosterschelde %>%
oosterschelde filter(months(DATE) %in% c("August", "September"))
Taxon names need to be standardized to comply with the names in the
WoRMS-database. The as_accepted
-function does this
conversion by using the TWN-list (https://taxainfo.nl/). This list is based on the WoRMS
database (World Register of Marine Species, https://www.marinespecies.org/).
We can use the is_accepted
-function to check if a taxon
complies with WoRMS:
%>%
oosterschelde mutate(COMPLIANT = is_accepted(taxon = TAXON)) %>%
select(OBJECTID, HABITAT, DATE, TAXON, COUNT, COMPLIANT)
# A tibble: 3,737 × 6
OBJECTID HABITAT DATE TAXON COUNT COMPLIANT<chr> <chr> <date> <chr> <int> <lgl>
1 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Aphelochaeta marioni 1 TRUE
2 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Crangon crangon 1 TRUE
3 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Nephtys hombergii 4 TRUE
4 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Oligochaeta 5 TRUE
5 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Pygospio elegans 12 TRUE
6 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Scoloplos armiger 1 TRUE
7 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Spio martinensis 1 TRUE
8 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Spiophanes bombyx 8 TRUE
9 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Corophium arenarium 1 TRUE
10 nl89_oostsde Polyhaline-Subtidal 2010-09-06 Nephtys cirrosa 1 TRUE
# … with 3,727 more rows
This returns a logical vector with TRUE
if a taxon
complies with WoRMS and FALSE
otherwise. The total number
of records, compliant names, and missing names is given below
%>%
oosterschelde mutate(COMPLIANT = is_accepted(taxon = TAXON)) %>%
summarise(
N_RECORDS = n(),
N_COMPLIANT = sum(COMPLIANT),
N_MISSING = N_RECORDS - N_COMPLIANT
)# A tibble: 1 × 3
N_RECORDS N_COMPLIANT N_MISSING<int> <int> <int>
1 3737 3718 19
Taxa not compliant with WoRMS (if any) are given below and will be removed:
%>%
oosterschelde filter(!is_accepted(taxon = TAXON))
# A tibble: 19 × 8
OBJECTID SAMPLEID DATE ID HABITAT AREA TAXON COUNT<chr> <int> <date> <int> <chr> <dbl> <chr> <int>
1 nl89_oostsde 14964 2010-09-06 8 Polyhaline-Subtidal 0.0774 Crustacea 1
2 nl89_oostsde 14972 2010-09-06 16 Polyhaline-Subtidal 0.0157 Insecta 1
3 nl89_oostsde 14983 2010-09-07 17 Polyhaline-Subtidal 0.0157 Insecta 1
4 nl89_oostsde 15005 2010-09-07 39 Polyhaline-Subtidal 0.0157 Crustacea 1
5 nl89_oostsde 15025 2010-09-10 59 Polyhaline-Intertidal 0.0157 Insecta 3
6 nl89_oostsde 15075 2010-09-24 73 Polyhaline-Intertidal 0.0157 Insecta 1
7 nl89_oostsde 15468 2011-08-18 134 Polyhaline-Intertidal 0.0157 Animalia 1
8 nl89_oostsde 15260 2011-09-08 199 Polyhaline-Subtidal 0.0157 Animalia 1
9 nl89_oostsde 15507 2011-09-13 205 Polyhaline-Intertidal 0.0157 Animalia 1
10 nl89_oostsde 15241 2011-09-13 206 Polyhaline-Intertidal 0.0157 Animalia 1
11 nl89_oostsde 15509 2011-09-14 215 Polyhaline-Intertidal 0.0157 Animalia 1
12 nl89_oostsde 15631 2012-08-29 295 Polyhaline-Subtidal 0.0774 Animalia 1
13 nl89_oostsde 15633 2012-08-27 297 Polyhaline-Subtidal 0.0157 Crustacea 1
14 nl89_oostsde 15651 2012-08-29 315 Polyhaline-Subtidal 0.0157 Animalia 1
15 nl89_oostsde 15675 2012-08-27 339 Polyhaline-Subtidal 0.0157 Animalia 1
16 nl89_oostsde 15707 2012-09-18 369 Polyhaline-Intertidal 0.0157 Animalia 1
17 nl89_oostsde 15708 2012-09-06 370 Polyhaline-Intertidal 0.0157 Animalia 1
18 nl89_oostsde 15710 2012-08-20 372 Polyhaline-Intertidal 0.0157 Animalia 1
19 nl89_oostsde 15720 2012-08-20 382 Polyhaline-Intertidal 0.0157 Animalia 1
<- oosterschelde %>%
oosterschelde filter(is_accepted(taxon = TAXON))
Other examples of the usage of the is_accepted
and
as_accepted
-functions are:
is_accepted(taxon = "Petricola pholadiformis")
1] FALSE
[as_accepted(taxon = "Petricola pholadiformis")
1] "Petricolaria pholadiformis"
[is_accepted(taxon = "Petricolaria pholadiformis")
1] TRUE [
If we want to make sure that all taxa names comply with WoRMS we simply use:
<- oosterschelde %>%
oosterschelde mutate(TAXON = as_accepted(TAXON))
Taxon names that are not in the WoRMS/TWN-list get name
NA
(not available):
%>%
oosterschelde filter(!is_accepted(taxon = TAXON)) %>%
nrow1] 0 [
In our case all names comply with those in WoRMS.
Genus to species conversion reallocates the counts of taxa that are identified at the genus level to taxa in the same sampling unit and of the same genus but that are identified on the species level. The redistribution of counts is proportional to the number of counts of taxa at the species level (Van Loon et al., 2015).
The Oosterschelde data set only contains individuals at the genus and species level (individuals at higher order taxonomic levels have been removed for didactic purposes only).
It is convenient to split each taxon into its generic and specific name. This can be accomplished as follows:
<- oosterschelde %>%
oosterschelde mutate(
GENERIC = generic_name(TAXON),
SPECIFIC = specific_name(TAXON)
)
oosterschelde# A tibble: 3,718 × 10
OBJECTID SAMPLEID DATE ID HABITAT AREA TAXON COUNT GENERIC SPECI…¹<chr> <int> <date> <int> <chr> <dbl> <chr> <int> <chr> <chr>
1 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Apheloch… 1 Aphelo… marioni
2 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Crangon … 1 Crangon crangon
3 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Nephtys … 4 Nephtys homber…
4 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Oligocha… 5 <NA> <NA>
5 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Pygospio… 12 Pygosp… elegans
6 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Scoloplo… 1 Scolop… armiger
7 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Spio mar… 1 Spio martin…
8 nl89_oostsde 14956 2010-09-06 1 Polyhaline-Subtidal 0.0157 Spiophan… 8 Spioph… bombyx
9 nl89_oostsde 14957 2010-09-06 2 Polyhaline-Subtidal 0.0157 Corophiu… 1 Coroph… arenar…
10 nl89_oostsde 14957 2010-09-06 2 Polyhaline-Subtidal 0.0157 Nephtys … 1 Nephtys cirrosa
# … with 3,708 more rows, and abbreviated variable name ¹SPECIFIC
Both functions generic_name
and
specific_name
return NA
if TAXON is not a
valid binomial. That is the case if a taxon has only been identified on
the genus level. For example:
is_binomen("Nephtys")
1] FALSE
[is_binomen("Nephtys cirrosa")
1] TRUE [
We will create a new column to indicate these cases.
<- oosterschelde %>%
oosterschelde mutate(
IS_GENUS = is.na(GENERIC),
GENERIC = ifelse(IS_GENUS, TAXON, GENERIC)
)
The number of taxa that has been identified at the genus level is
%>%
oosterschelde filter(IS_GENUS) %>%
nrow1] 1426 [
Genus to species conversion is performed for each genus in a sample
by means of the genus_to_species
function:
<- oosterschelde %>%
oosterschelde group_by(ID, GENERIC) %>%
mutate(NEWCOUNT = genus_to_species(is_genus = IS_GENUS, count = COUNT)) %>%
ungroup
Corophium arenarium (left) and Corophium volutator (right) (source: https://nature22.com/)
To illustrate the algorithm, consider all records with generic name Corophium in sample 130:
%>%
oosterschelde filter(GENERIC == "Corophium", ID == 130) %>%
arrange(TAXON)
# A tibble: 3 × 12
OBJECTID SAMPLEID DATE ID HABITAT AREA TAXON COUNT GENERIC SPECI…¹ IS_GE…² NEWCO…³<chr> <int> <date> <int> <chr> <dbl> <chr> <int> <chr> <chr> <lgl> <dbl>
1 nl89_oostsde 15432 2011-08-18 130 Polyhal… 0.0236 Coro… 5 Coroph… <NA> TRUE 0
2 nl89_oostsde 15432 2011-08-18 130 Polyhal… 0.0236 Coro… 17 Coroph… arenar… FALSE 21.5
3 nl89_oostsde 15432 2011-08-18 130 Polyhal… 0.0236 Coro… 2 Coroph… voluta… FALSE 2.53
# … with abbreviated variable names ¹SPECIFIC, ²IS_GENUS, ³NEWCOUNT
In this sample, the genus Corophium is identified 19 times at the species level, i.e., 17 times as Corophium arenarium and twice as Corophium volutator. For five individuals, the analyst was unable to identify the species name, and only reported the genus. The genus to species algorithm now proportionally reallocates the \(5\) individuals at the genus level to the taxa at the species level. That is, an additional \(5 \times \frac{17}{17 + 2} = 4.47\) individuals will be classified as Corophium arenarium and \(5 \times \frac{2}{17 + 2} = 0.53\) as Corophium volutator.
Note that the total number of species will not be affected:
%>%
oosterschelde summarise(sum(COUNT), sum(NEWCOUNT))
# A tibble: 1 × 2
`sum(COUNT)` `sum(NEWCOUNT)`
<int> <dbl>
1 33079 33079
To finalize our analysis, we will set COUNT to the value of NEWCOUNT, and remove redundant columns and records.
<- oosterschelde %>%
oosterschelde mutate(COUNT = NEWCOUNT) %>%
select(ID, HABITAT, AREA, DATE, TAXON, COUNT) %>%
filter(COUNT > 0)
oosterschelde# A tibble: 3,562 × 6
ID HABITAT AREA DATE TAXON COUNT<int> <chr> <dbl> <date> <chr> <dbl>
1 1 Polyhaline-Subtidal 0.0157 2010-09-06 Aphelochaeta marioni 1
2 1 Polyhaline-Subtidal 0.0157 2010-09-06 Crangon crangon 1
3 1 Polyhaline-Subtidal 0.0157 2010-09-06 Nephtys hombergii 4
4 1 Polyhaline-Subtidal 0.0157 2010-09-06 Oligochaeta 5
5 1 Polyhaline-Subtidal 0.0157 2010-09-06 Pygospio elegans 12
6 1 Polyhaline-Subtidal 0.0157 2010-09-06 Scoloplos armiger 1
7 1 Polyhaline-Subtidal 0.0157 2010-09-06 Spio martinensis 1
8 1 Polyhaline-Subtidal 0.0157 2010-09-06 Spiophanes bombyx 8
9 2 Polyhaline-Subtidal 0.0157 2010-09-06 Corophium arenarium 1
10 2 Polyhaline-Subtidal 0.0157 2010-09-06 Nephtys cirrosa 1
# … with 3,552 more rows
Analysis results make only sense when all sampling units are collected on the same support. That is not the case for the oosterschelde data:
<- oosterschelde %>%
d group_by(AREA) %>%
summarise(n = n())
d# A tibble: 3 × 2
AREA n<dbl> <int>
1 0.0157 3229
2 0.0236 81
3 0.0774 252
We distinguish three different supports (0.0157, 0.0236, 0.0774 m2). In this section, we will demonstrate how to pool these data to approximately the same support in the range from 0.09 to 0.11 m2. See Van Loon et al. (2015) for more details.
We will only pool samples of the same year, so we’ll start by adding a new column to our table containing the year:
<- oosterschelde %>%
oosterschelde mutate(YEAR = DATE %>% format("%Y"))
Next, we will randomly pool samples for each HABITAT and YEAR. We
will pool several times (in the example below
n_pool_runs = 10
), to reduce the effect of pool composition
and to make sure that each sample will be represented in a pool (no
leftovers on average).
<- 10
n_pool_runs <- replicate(
pool_runs n = n_pool_runs, {
%>%
oosterschelde group_by(HABITAT, YEAR) %>%
mutate(
POOLID = pool(
sample_id = ID,
area = AREA,
target_area = c(0.09, 0.11)
)%>%
) %>%
ungroup select(POOLID)
} )
This procedure will return al list of pool identifiers (POOLID) for each pool run:
names(pool_runs) <- paste0("POOLRUN", 1:n_pool_runs)
<- pool_runs %>% as_tibble
pool_runs
pool_runs# A tibble: 3,562 × 10
POOLRUN1 POOLRUN2 POOLRUN3 POOLRUN4 POOLRUN5 POOLRUN6 POOLRUN7 POOLRUN8 POOLRUN9 POOLRUN10<int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 7 NA 3 1 2 NA 10 NA NA NA
2 7 NA 3 1 2 NA 10 NA NA NA
3 7 NA 3 1 2 NA 10 NA NA NA
4 7 NA 3 1 2 NA 10 NA NA NA
5 7 NA 3 1 2 NA 10 NA NA NA
6 7 NA 3 1 2 NA 10 NA NA NA
7 7 NA 3 1 2 NA 10 NA NA NA
8 7 NA 3 1 2 NA 10 NA NA NA
9 9 2 6 3 2 4 8 3 8 10
10 9 2 6 3 2 4 8 3 8 10
# … with 3,552 more rows
Each row in this table corresponds to the row with the same index in
oosterschelde
. Therefore, it is quite easy to combine this
table with the ‘oosterschelde’ data:
<- oosterschelde
oosterschelde_orig <- oosterschelde %>%
oosterschelde bind_cols(pool_runs) %>%
as_tibble
oosterschelde# A tibble: 3,562 × 17
ID HABITAT AREA DATE TAXON COUNT YEAR POOLR…¹ POOLR…² POOLR…³ POOLR…⁴ POOLR…⁵ POOLR…⁶<int> <chr> <dbl> <date> <chr> <dbl> <chr> <int> <int> <int> <int> <int> <int>
1 1 Polyha… 0.0157 2010-09-06 Aphe… 1 2010 7 NA 3 1 2 NA
2 1 Polyha… 0.0157 2010-09-06 Cran… 1 2010 7 NA 3 1 2 NA
3 1 Polyha… 0.0157 2010-09-06 Neph… 4 2010 7 NA 3 1 2 NA
4 1 Polyha… 0.0157 2010-09-06 Olig… 5 2010 7 NA 3 1 2 NA
5 1 Polyha… 0.0157 2010-09-06 Pygo… 12 2010 7 NA 3 1 2 NA
6 1 Polyha… 0.0157 2010-09-06 Scol… 1 2010 7 NA 3 1 2 NA
7 1 Polyha… 0.0157 2010-09-06 Spio… 1 2010 7 NA 3 1 2 NA
8 1 Polyha… 0.0157 2010-09-06 Spio… 8 2010 7 NA 3 1 2 NA
9 2 Polyha… 0.0157 2010-09-06 Coro… 1 2010 9 2 6 3 2 4
10 2 Polyha… 0.0157 2010-09-06 Neph… 1 2010 9 2 6 3 2 4
# … with 3,552 more rows, 4 more variables: POOLRUN7 <int>, POOLRUN8 <int>, POOLRUN9 <int>,
# POOLRUN10 <int>, and abbreviated variable names ¹POOLRUN1, ²POOLRUN2, ³POOLRUN3, ⁴POOLRUN4,
# ⁵POOLRUN5, ⁶POOLRUN6
It was not always possible to use all samples for pooling. These
‘leftover’ samples have POOLID NA
. However, on average each
sample has been pooled between 5 and 10 times.
For further analysis, it is convenient to convert this table from
‘wide’ to ‘long’-format. This can be done efficiently by functions of
the tidyr
-package (Wickham, 2017), which provides an
interesting framework to tidy your data (Wickham, 2014).
<- oosterschelde %>%
oosterschelde gather(key = "POOLRUN", value = "POOLID", starts_with("POOLRUN")) %>%
mutate(POOLRUN = parse_number(POOLRUN) %>% as.integer) %>%
filter(!is.na(POOLID)) %>%
select(POOLRUN, POOLID, HABITAT, AREA, YEAR, ID, TAXON, COUNT)
oosterschelde# A tibble: 33,439 × 8
POOLRUN POOLID HABITAT AREA YEAR ID TAXON COUNT<int> <int> <chr> <dbl> <chr> <int> <chr> <dbl>
1 1 7 Polyhaline-Subtidal 0.0157 2010 1 Aphelochaeta marioni 1
2 1 7 Polyhaline-Subtidal 0.0157 2010 1 Crangon crangon 1
3 1 7 Polyhaline-Subtidal 0.0157 2010 1 Nephtys hombergii 4
4 1 7 Polyhaline-Subtidal 0.0157 2010 1 Oligochaeta 5
5 1 7 Polyhaline-Subtidal 0.0157 2010 1 Pygospio elegans 12
6 1 7 Polyhaline-Subtidal 0.0157 2010 1 Scoloplos armiger 1
7 1 7 Polyhaline-Subtidal 0.0157 2010 1 Spio martinensis 1
8 1 7 Polyhaline-Subtidal 0.0157 2010 1 Spiophanes bombyx 8
9 1 9 Polyhaline-Subtidal 0.0157 2010 2 Corophium arenarium 1
10 1 9 Polyhaline-Subtidal 0.0157 2010 2 Nephtys cirrosa 1
# … with 33,429 more rows
To check if the pooling algorithm succeeded in its task, we compute the area of each pool. These areas should vary between 0.09 and 0.11 m2, i.e., our target area.
<- oosterschelde %>%
d group_by(HABITAT, YEAR, POOLRUN, POOLID) %>%
select(ID, AREA) %>%
distinct(ID, AREA) %>%
summarise(AREA = sum(AREA))
: `HABITAT`, `YEAR`, `POOLRUN`, `POOLID`
Adding missing grouping variables`summarise()` has grouped output by 'HABITAT', 'YEAR', 'POOLRUN'. You can override using the
`.groups` argument.
d# A tibble: 569 × 5
# Groups: HABITAT, YEAR, POOLRUN [60]
HABITAT YEAR POOLRUN POOLID AREA<chr> <chr> <int> <int> <dbl>
1 Polyhaline-Intertidal 2010 1 1 0.0942
2 Polyhaline-Intertidal 2010 1 2 0.0942
3 Polyhaline-Intertidal 2010 1 3 0.0942
4 Polyhaline-Intertidal 2010 1 4 0.0942
5 Polyhaline-Intertidal 2010 1 5 0.0942
6 Polyhaline-Intertidal 2010 1 6 0.0942
7 Polyhaline-Intertidal 2010 1 7 0.0942
8 Polyhaline-Intertidal 2010 2 1 0.0942
9 Polyhaline-Intertidal 2010 2 2 0.0942
10 Polyhaline-Intertidal 2010 2 3 0.0942
# … with 559 more rows
The following code computes the frequencies of the available areas:
<- d %>%
d select(AREA) %>%
group_by(AREA) %>%
summarise(n = n()) %>%
arrange(AREA)
: `HABITAT`, `YEAR`, `POOLRUN`
Adding missing grouping variables
d# A tibble: 6 × 2
AREA n<dbl> <int>
1 0.0931 65
2 0.0942 421
3 0.0943 10
4 0.102 46
5 0.109 25
6 0.11 2
This is also visualized in the bar graph below. All the pooled areas are nicely within the target area demarcated by the red lines.
The pooled samples can be used to estimate biodiversity measures (see the next section for details). For instance, the species richness for each pool is given by:
<- oosterschelde %>%
d group_by(HABITAT, YEAR, POOLRUN, POOLID) %>%
summarise(S = species_richness(taxon = TAXON, count = COUNT))
`summarise()` has grouped output by 'HABITAT', 'YEAR', 'POOLRUN'. You can override using the
`.groups` argument.
d# A tibble: 569 × 5
# Groups: HABITAT, YEAR, POOLRUN [60]
HABITAT YEAR POOLRUN POOLID S<chr> <chr> <int> <int> <int>
1 Polyhaline-Intertidal 2010 1 1 24
2 Polyhaline-Intertidal 2010 1 2 41
3 Polyhaline-Intertidal 2010 1 3 21
4 Polyhaline-Intertidal 2010 1 4 27
5 Polyhaline-Intertidal 2010 1 5 31
6 Polyhaline-Intertidal 2010 1 6 28
7 Polyhaline-Intertidal 2010 1 7 28
8 Polyhaline-Intertidal 2010 2 1 25
9 Polyhaline-Intertidal 2010 2 2 25
10 Polyhaline-Intertidal 2010 2 3 29
# … with 559 more rows
The annual mean species richness for each habitat and year is given by:
<- d %>%
d group_by(HABITAT, YEAR) %>%
summarise(S = mean(S))
`summarise()` has grouped output by 'HABITAT'. You can override using the `.groups` argument.
d# A tibble: 6 × 3
# Groups: HABITAT [2]
HABITAT YEAR S<chr> <chr> <dbl>
1 Polyhaline-Intertidal 2010 28.6
2 Polyhaline-Intertidal 2011 34.8
3 Polyhaline-Intertidal 2012 32.3
4 Polyhaline-Subtidal 2010 35.6
5 Polyhaline-Subtidal 2011 46.7
6 Polyhaline-Subtidal 2012 46.9
Several biodiversity measures have been implemented in the
benthos
-package. In the sections below, we will demonstrate
how to calculate these measures. To simplify things, all analysis will
be performed on a single sampling unit:
<- oosterschelde %>%
d filter(HABITAT == "Polyhaline-Subtidal", YEAR == 2010, POOLRUN == 1, POOLID == 1) %>%
select(TAXON, COUNT) %>%
arrange(TAXON)
d# A tibble: 41 × 2
TAXON COUNT<chr> <dbl>
1 Angulus tenuis 1
2 Aphelochaeta marioni 1
3 Aphelochaeta marioni 25
4 Bathyporeia 2
5 Capitella capitata 2
6 Cossura longocirrata 1
7 Echinocardium cordatum 1
8 Ensis 1
9 Lanice conchilega 4
10 Magelona johnstoni 2
# … with 31 more rows
The total abundance is the total number of individuals in a sampling unit, and is computed by:
%>% total_abundance(count = COUNT)
d 1] 131 [
The abundance is the total number of individuals per taxon in a
sampling unit. It can be computed by means of the
abundance
-function:
%>% abundance(taxon = TAXON, count = COUNT) %>% as.matrix
d 1]
[,1
Angulus tenuis 26
Aphelochaeta marioni 2
Bathyporeia 2
Capitella capitata 1
Cossura longocirrata 1
Echinocardium cordatum 1
Ensis 4
Lanice conchilega 3
Magelona johnstoni 1
Magelona papillicornis 1
Malmgreniella lunulata 1
Mytilus edulis 1
Nemertea 8
Nephtys 1
Nephtys hombergii 8
Oligochaeta 6
Ophiura albida 1
Peringia ulvae 1
Poecilochaetus serpens 1
Pseudopolydora pulchra 4
Retusa obtusa 22
Scoloplos armiger 1
Spio martinensis 17
Spiophanes bombyx 1
Spisula 1
Streblospio benedicti 1
Tellimya ferruginosa 1
Terebellidae 12 Urothoe poseidonis
Species richness \(S\) is the number of different species in a (pooled) sample. It can be computed by means of
%>% species_richness(taxon = TAXON, count = COUNT)
d 1] 29 [
Species richness \(S\) is strongly dependent on sampling size. Margalef’s diversity index \(D_\mathrm{M}\) takes sampling size into account. It is given by \[ D_\mathrm{M} = \frac{S-1}{\ln(N)} \] where \(N\) is the total abundance, i.e, the total number of individuals in the sampling unit. In case \(N=1\), this index will be set to zero.
It can be computed for a specific sampling unit by:
%>% margalef(taxon = TAXON, count = COUNT)
d 1] 5.743357 [
Species richness \(S\) is strongly dependent on sampling size. Like Margalef’s diversity index \(D_\mathrm{M}\), Rygg’s index of diversity takes sampling size into account (Rygg, 2006). It is given by \[ SN = \frac{\ln{S}}{\ln(\ln(N))} \] where \(N\) is the total abundance, i.e, the total number of individuals in the sampling unit.
It can be computed for a specific sampling unit by:
%>% rygg(taxon = TAXON, count = COUNT)
d 1] 2.125603 [
Rygg’s index shows some inconsistencies for small N and S ((S=2, N=2), (S=2, N=3) and (S=3, N=3)). This is illustrated in the third figure below. As a reference, also Margalef’s index is given in the top figure.
The second figure shows a graph based on the adjusted version of Rygg’s index. It is given by:
\[ SNA = \frac{\ln{S}}{\ln(\ln(N+1)+1)} \]
The adjusted version of Rygg’s index can be computed by means of:
%>% rygg(taxon = TAXON, count = COUNT, adjusted = TRUE)
d 1] 1.900244 [
Hurlbert (1971) gives the expected number of species in a sample of n individuals selected at random (without replacement) from a collection of N individuals and S species:
\[ \mathrm{E}(S_n) = \sum_{i=1}^S \left[1 - \frac{\binom{N-N_i}{n}}{\binom{N}{n}} \right] \]
Contrary to species richness, this measure is not dependent on the number of individuals. It can be computed for a specific sampling unit by:
%>% hurlbert(taxon = TAXON, count = COUNT, n = 100)
d 1] 24.85011 [
\(\mathrm{E}(S_n)\) can be computed for \(n \in {1, 2, \dots, N}\), where \(N\) is the total abundance. This has been done in the figure below.
Note that \(\mathrm{E}(S_n)\) can be computed for \(n \leq N\). Extrapolation, i.e. \(n > N\), is not possible.
Simpson’s Measure of Concentration gives the probability that two individuals selected at random from a sample will belong to the same species. For an infinite sample Simpson’s Index is given by: \[ \lambda = \sum_{i=1}^S \pi_i^2 \] where \(\pi_i\) the proportion of the individuals in species \(i\). For a finite sample, Simpson’s index is: \[ L = \sum_{i=1}^S \frac{n_i (n_i-1)}{N (N-1)} \] where \(n_i\) the number of individuals in species \(i\) and \(N\) the total number of individuals.
The finite sample case can be computed by:
%>% simpson(taxon = TAXON, count = COUNT)
d 1] 0.09935408 [
Related to Simpson’s index is Hurlbert’s probability of inter-specific encounter (PIE). It gives the probability that two individuals selected at random (without replacement) from a sample will belong to different species (Hurlbert, 1971, p.579, Eq. 3): \[ \Delta_1 = \sum_{i=1}^S \left(\frac{N_i}{N}\right)\left(\frac{N-N_i}{N-1}\right) = \left(\frac{N}{N-1}\right)\Delta_2 \] where \(\Delta_2\) (Hurlbert, 1971, p.579, Eq. 4) is the probability that two individuals selected at random (with replacement) from a sample will belong to different species: \[ \Delta_2 = 1 - \sum_{i=1}^S \pi_i^2 \] where \(N_i\) is the number of individuals of the \(i\)th species in the community, \(N\) is the total number of individuals in the community, \(\pi_i = N_i/N\), and \(S\) is the number of species in the community.
Hurlbert’s PIE can be computed by means of:
%>% hpie(taxon = TAXON, count = COUNT)
d 1] 0.9006459 [
Note that it is the complement of Simpson’s Measure of Concentration (for finite sample sizes):
1 - d %>% simpson(taxon = TAXON, count = COUNT)
1] 0.9006459 [
Shannon’s index (or entropy) is given by:
\[ H' = -\sum_i p_i \log_2 p_i \] where \(p_i\) is the proportion of individuals found in taxon \(i\). It can be computed for a specific sampling unit by:
%>% shannon(taxon = TAXON, count = COUNT)
d 1] 3.818995 [
According to Hill (1973): ‘a diversity number is figuratively a measure of how many species are present if we examine the sample down to a certain depth among its rarities. If we examine superficially (e.g., by using \(N_2\)) we shall see only the more abundant species. If we look deeply e.g. by using \(N_0\) we shall see all the species present.’. His diversity number is given by: \[ N_a = \left(\sum_{i=1}^S p_i^a\right)^{1/(1-a)} \]
Depending on parameter \(a\), Hill’s numbers gradually give more weight to the rarest species (small \(a\)) or most common species (large \(a\)).
Special cases are:
%>% hill(taxon = TAXON, count = COUNT, a = 0)
d 1] 29
[%>% hill(taxon = TAXON, count = COUNT, a = 1)
d N_a(a=1) is undefined. Therefore N_a(lim a->1) will be returned
1] 14.11342
[%>% hill(taxon = TAXON, count = COUNT, a = 2)
d 1] 9.413604 [
or (efficient) short cuts:
%>% hill0(taxon = TAXON, count = COUNT)
d 1] 29
[%>% hill1(taxon = TAXON, count = COUNT)
d 1] 14.11342
[%>% hill2(taxon = TAXON, count = COUNT)
d 1] 9.413604 [
The figure below shows Hill’s Diversity Number as function of \(a\). From right to left, the focus is more and more on rare species.
Borja et al. (2000) introduced the Biotic Coefficient. The expression in their paper can be rewritten as: \[ c_\mathrm{b} = \frac{3}{2} \sum_{i=2}^5 (i-1) p_i \] where \(\mathrm{p}\) is a vector of length 5 containing the proportions of species in the sensitivity classes (I, II, III, IV, V) respectively.
It can be computed for a specific sampling unit by:
%>%
d ambi(taxon = TAXON, count = COUNT)
: `filter_()` was deprecated in dplyr 0.7.0.
Warning`filter()` instead.
Please use vignette('programming') for more help
See 8 hours.
This warning is displayed once every `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Call : `select_()` was deprecated in dplyr 0.7.0.
Warning`select()` instead.
Please use 8 hours.
This warning is displayed once every `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Call : `summarise_()` was deprecated in dplyr 0.7.0.
Warning`summarise()` instead.
Please use 8 hours.
This warning is displayed once every `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Call : `mutate_()` was deprecated in dplyr 0.7.0.
Warning`mutate()` instead.
Please use vignette('programming') for more help
See 8 hours.
This warning is displayed once every `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Call : `arrange_()` was deprecated in dplyr 0.7.0.
Warning`arrange()` instead.
Please use vignette('programming') for more help
See 8 hours.
This warning is displayed once every `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Call : `group_by_()` was deprecated in dplyr 0.7.0.
Warning`group_by()` instead.
Please use vignette('programming') for more help
See 8 hours.
This warning is displayed once every `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Call 1] 2.734615 [
The accuracy of the AMBI depends (among other things) on the number
of taxa for which a sensitivity group is available. The
has_ambi
function indicates if a group has been assigned to
a taxa or not: Taxa with an AMBI sensitivity group are
%>%
d mutate(HAS_GROUP = has_ambi(taxon = TAXON))
# A tibble: 41 × 3
TAXON COUNT HAS_GROUP<chr> <dbl> <lgl>
1 Angulus tenuis 1 TRUE
2 Aphelochaeta marioni 1 TRUE
3 Aphelochaeta marioni 25 TRUE
4 Bathyporeia 2 TRUE
5 Capitella capitata 2 TRUE
6 Cossura longocirrata 1 TRUE
7 Echinocardium cordatum 1 TRUE
8 Ensis 1 TRUE
9 Lanice conchilega 4 TRUE
10 Magelona johnstoni 2 TRUE
# … with 31 more rows
The percentage of the total abundance without an AMBI group is given below
%>%
d mutate(HAS_GROUP = has_ambi(taxon = TAXON)) %>%
summarise(percentage = 100 * sum(COUNT[!HAS_GROUP]) / sum(COUNT)) %>%
as.numeric1] 0.7633588 [
The infaunal trophic index (ITI) is calculated as: \[ \mathrm{ITI} = 100 \sum_{i=1}^3 \frac{(4-i)}{3} p_i \] where \(p_i\) is the proportion of species in class \(i\), where
See Gittenberger & van Loon (2013) for more information.
We can estimate the ITI by means of:
%>%
d iti(taxon = TAXON, count = COUNT)
1] 22.82051 [
The accuracy of the ITI depends (among other things) on the number of
taxa for which a sensitivity group is available. The
has_iti
function indicates if a group has been assigned to
a taxa or not: Taxa with an ITI sensitivity group are
%>%
d mutate(HAS_GROUP = has_iti(taxon = TAXON))
# A tibble: 41 × 3
TAXON COUNT HAS_GROUP<chr> <dbl> <lgl>
1 Angulus tenuis 1 TRUE
2 Aphelochaeta marioni 1 TRUE
3 Aphelochaeta marioni 25 TRUE
4 Bathyporeia 2 TRUE
5 Capitella capitata 2 TRUE
6 Cossura longocirrata 1 TRUE
7 Echinocardium cordatum 1 TRUE
8 Ensis 1 TRUE
9 Lanice conchilega 4 TRUE
10 Magelona johnstoni 2 TRUE
# … with 31 more rows
The percentage of the total abundance without an ITI group is given below
%>%
d mutate(HAS_GROUP = has_iti(taxon = TAXON)) %>%
summarise(percentage = 100 * sum(COUNT[!HAS_GROUP]) / sum(COUNT)) %>%
as.numeric1] 0.7633588 [
Multiple measures of biodiversity for a specified grouping of the data can be computed for all sampling units by means of:
%>%
oosterschelde group_by(HABITAT, YEAR, POOLRUN, POOLID) %>%
summarise(
N = total_abundance(count = COUNT),
S = species_richness(taxon = TAXON, count = COUNT),
D = margalef(taxon = TAXON, count = COUNT),
SN = rygg(taxon = TAXON, count = COUNT),
SNa = rygg(taxon = TAXON, count = COUNT, adjusted = TRUE),
H = shannon(taxon = TAXON, count = COUNT)
)`summarise()` has grouped output by 'HABITAT', 'YEAR', 'POOLRUN'. You can override using the
`.groups` argument.
# A tibble: 569 × 10
# Groups: HABITAT, YEAR, POOLRUN [60]
HABITAT YEAR POOLRUN POOLID N S D SN SNa H<chr> <chr> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 Polyhaline-Intertidal 2010 1 1 519 24 3.68 1.73 1.60 2.41
2 Polyhaline-Intertidal 2010 1 2 616 41 6.23 2.00 1.85 3.64
3 Polyhaline-Intertidal 2010 1 3 370 21 3.38 1.71 1.57 2.67
4 Polyhaline-Intertidal 2010 1 4 283 27 4.61 1.90 1.74 3.58
5 Polyhaline-Intertidal 2010 1 5 515 31 4.80 1.87 1.73 2.79
6 Polyhaline-Intertidal 2010 1 6 368 28 4.57 1.88 1.72 3.45
7 Polyhaline-Intertidal 2010 1 7 595 28 4.23 1.80 1.67 3.16
8 Polyhaline-Intertidal 2010 2 1 500 25 3.86 1.76 1.63 2.66
9 Polyhaline-Intertidal 2010 2 2 413 25 3.98 1.79 1.65 3.33
10 Polyhaline-Intertidal 2010 2 3 394 29 4.69 1.88 1.73 3.83
# … with 559 more rows
or more concise:
%>%
oosterschelde group_by(HABITAT, YEAR, POOLRUN, POOLID) %>%
summarise(
N = total_abundance(., COUNT),
S = species_richness(., TAXON, COUNT),
D = margalef(., TAXON, COUNT),
SN = rygg(., TAXON, COUNT),
SNa = rygg(., TAXON, COUNT, adjusted = TRUE),
H = shannon(., TAXON, COUNT)
)`summarise()` has grouped output by 'HABITAT', 'YEAR', 'POOLRUN'. You can override using the
`.groups` argument.
# A tibble: 569 × 10
# Groups: HABITAT, YEAR, POOLRUN [60]
HABITAT YEAR POOLRUN POOLID N S D SN SNa H<chr> <chr> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 Polyhaline-Intertidal 2010 1 1 519 24 3.68 1.73 1.60 2.41
2 Polyhaline-Intertidal 2010 1 2 616 41 6.23 2.00 1.85 3.64
3 Polyhaline-Intertidal 2010 1 3 370 21 3.38 1.71 1.57 2.67
4 Polyhaline-Intertidal 2010 1 4 283 27 4.61 1.90 1.74 3.58
5 Polyhaline-Intertidal 2010 1 5 515 31 4.80 1.87 1.73 2.79
6 Polyhaline-Intertidal 2010 1 6 368 28 4.57 1.88 1.72 3.45
7 Polyhaline-Intertidal 2010 1 7 595 28 4.23 1.80 1.67 3.16
8 Polyhaline-Intertidal 2010 2 1 500 25 3.86 1.76 1.63 2.66
9 Polyhaline-Intertidal 2010 2 2 413 25 3.98 1.79 1.65 3.33
10 Polyhaline-Intertidal 2010 2 3 394 29 4.69 1.88 1.73 3.83
# … with 559 more rows
In the previous section we used \(10\) pool runs. But is this sufficient to stabilize the results? In this section we will demonstrate the effect of pooling on the average species richness. We will reuse large sections of the code given earlier in this document:
<- oosterschelde_orig
oosterschelde <- 100
n_pool_runs <- replicate(
pool_runs n = n_pool_runs, {
%>%
oosterschelde group_by(HABITAT, YEAR) %>%
mutate(
POOLID = pool(
sample_id = ID,
area = AREA,
target_area = c(0.09, 0.11)
)%>%
) %>%
ungroup select(POOLID)
}
)names(pool_runs) <- paste0("POOLRUN", 1:n_pool_runs)
<- pool_runs %>%
d %>%
as_tibble bind_cols(oosterschelde) %>%
%>%
as_tibble gather(key = "POOLRUN", value = "POOLID", starts_with("POOLRUN")) %>%
mutate(POOLRUN = parse_number(POOLRUN) %>% as.integer) %>%
filter(!is.na(POOLID)) %>%
select(POOLRUN, POOLID, HABITAT, AREA, YEAR, ID, TAXON, COUNT) %>%
group_by(HABITAT, YEAR, POOLRUN, POOLID) %>%
summarise(S = species_richness(taxon = TAXON, count = COUNT)) %>%
group_by(HABITAT, YEAR, POOLRUN) %>%
summarise(S = mean(S)) %>%
mutate(S_rm = cummean(S))
`summarise()` has grouped output by 'HABITAT', 'YEAR', 'POOLRUN'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'HABITAT', 'YEAR'. You can override using the `.groups`
argument.
d# A tibble: 600 × 5
# Groups: HABITAT, YEAR [6]
HABITAT YEAR POOLRUN S S_rm<chr> <chr> <int> <dbl> <dbl>
1 Polyhaline-Intertidal 2010 1 29.4 29.4
2 Polyhaline-Intertidal 2010 2 29.9 29.6
3 Polyhaline-Intertidal 2010 3 27 28.8
4 Polyhaline-Intertidal 2010 4 27.9 28.5
5 Polyhaline-Intertidal 2010 5 28.6 28.5
6 Polyhaline-Intertidal 2010 6 27.6 28.4
7 Polyhaline-Intertidal 2010 7 28.7 28.4
8 Polyhaline-Intertidal 2010 8 27.4 28.3
9 Polyhaline-Intertidal 2010 9 27.7 28.2
10 Polyhaline-Intertidal 2010 10 28.9 28.3
# … with 590 more rows
The results are given in the figure below. The blue dots give the species richness for each pool-run for each year and habitat. The red line is the running mean species richness. In general, it stabilizes quickly.
Borja, A., J. Franco and V. Perez (2000). A Marine Biotic Index to Establish the Ecological Quality of Soft-Bottom Benthos Within European Estuarine and Coastal Environments. Marine Pollution Bulletin 40: 1100-1114
Gittenberger A. and W. van Loon, (2013). Sensitivities of marine macrozoobenthos to environmental pressures in the Netherlands. Nederlandse Faunistische Mededelingen 41: 79-112.
Hill, M.O., 1973. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology 54:427-432
Hurlbert, S.H., 1971. The Nonconcept of Species Diversity: A Critique and Alternative Parameters. Ecology 52:577-586.
Milton Bache, S. and H. Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr
Peet, R. K. 1974, The Measurement of Species Diversity. Annual Review of Ecology and Systematics 5:285-307.
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Rygg, B. (2006). Developing indices for quality-status classification of marine soft-bottom fauna in Norway. Norwegian Institute for Water Research, Oslo, Norway. NIVA Report SNO 5208-2006.
Van Loon, W.M.G.M. A.R. Boon, A. Gittenberger, D.J.J. Walvoort, M. Lavaleye, G.C.A. Duineveld, A.J. Verschoor, 2015. Application of the Benthic Ecosystem Quality Index 2 to benthos in Dutch transitional and coastal waters. Journal of Sea Research 103:1-13.
Wickham, H. (2014). Tidy data. The Journal of Statistical Software, vol. 59, 2014.
Wickham, H. (2017). tidyr: Easily Tidy Data with spread() and gather() Functions. R package version 0.7.2. https://CRAN.R-project.org/package=tidyr
Wickham, H. R. Francois, L. Henry and K. Müller (2017). dplyr: A Grammar of Data Manipulation. R package version 0.7.4. https://CRAN.R-project.org/package=dplyr