The goal of silicate is to bridge planar geospatial data types with flexible mesh data structures and visualization.
We aim to provide
The core of silicate are worker functions that are generic and work with any kind of data that has hierarchical structure. These functions work on models, that include various formats (sf, sp, trip) and also on silicate models themselves.
We have the following worker verbs, designed to work with many spatial data formats and with silicate’s own structures.
sc_object()
- highest level properties, the
“features”sc_coord()
- all instances of coordinates, labelled by
vertex if the source model includes themsc_vertex()
- only unique coordinates (in some
geometric space)sc_path()
- individual paths, sequential tracessc_edge()
- unique binary relations, unordered segments
(segments and edges are currently under review, and may change)sc_segment()
- all instances of edgessc_arc()
- unique topological paths, arcs either meet
two other arcs at a node, or include no nodessc_node()
- unique nodesThe idea is that each function can return the underlying entities of a data object, no matter its underlying format. This interoperability design contrasts with major spatial packages that require format peculiarities in order to work, even when these details are not relevant.
Silicate defines a number of key models to represent various
interpretations of hierarchical (usually spatial) data. These are
SC
, PATH
, ARC
and
TRI
. Most models have a counterpart
structurally-optimized version with a similar name:
SC0
, PATH0
, TRI0
. Other models
are possible, and include DEL
in the in-development anglr
package that extends TRI
.
Each model is composed of a type of primitive, and each
provides a normalization (de-duplication) of geometry for efficiency and
or topology. Silicate quite deliberately separates the concepts of
geometry and topology completely. Primitives define the topology (edges,
paths, arcs, or triangles) and the vertex table defines the geometry. We
reserve the names x_
, y_
, t_
(time) and z_
(elevation) for the usual geometric
dimensions, and these are treated specially by default. No limit is put
geometric dimension however, it’s possible to store anything at all on
the vertex table. Some models include an object
table, and
this represents a higher level grouping of primitives (and corresponds
to features in SF).
The most general model is SC
, composed of three tables
vertex
, edge
and object
and all
entities are explicitly labelled. Indexes between tables are unique and
persistent and arbitrary, they can be arbitrarily accessed. This is
closely related to the more bare-bones SC0
model, composed
of only two tables vertices
, and objects
.
These are related structurally by nesting the relations within
the object table. Here the relations are not persistent, so we can
subset the objects but we cannot change the vertex table with updating
these indexes.
SC0
can deal with 0-dimensional topology types (points)
as well as 1-dimensional types (edges), but SC
is strictly
for edges.
Further models PATH
, ARC
, and
TRI
cover a broad range of complex types, and each is
fundamental and distinct from the others. SC
can be used to
represent any model, but other models provide a better match to specific
use-cases, intermediate forms and serve to expand the relationships
between the model types.
SC
is the universal model, composed of binary
relationships, edges defined by pairs of vertices (a structural
primitive model)TRI
also a structural primitive model, for
triangulationsPATH
a sequential model, for the standard spatial
vector types, shapes defined by pathsARC
a sequential model, for arc-node topology
a shared-boundary decomposition of path modelsSC0
is a stripped down structural model analogous to
SC
, there are only implicit relations of object to
vertices, with a nested list of edge indexesThe models PATH0
and ARC0
are
in-development. By analogy to SC0
they will be composed of
two tables, object
and vertex
with nested
structural-index tables on object
holding the path and arc
indexes that are row numbers of vertex
. It’s not clear yet
if this vertex table should be de-duplicated.
Earlier versions included a mix of these models, and the definitions have changed many times. Still a work-in-progress.
An extension of the TRI
model DEL
is
provided in anglr
which builds high-quality triangulations, but the structural
representation is the same.
Each model is created by using a set of generic verbs that extract the underlying elements of a given model. This design means that the models themselves are completely generic, and methods for worker verbs can be defined as needed for a given context. Our ideal situation would be for external packages to publish methods for these verbs, keeping package-specific code in the original package. We think this provides a very powerful and general mechanism for a family of consistent packages.
There is another important function unjoin()
use to
normalize tables that have redundant information. The
unjoin()
isthe opposite of the database join, and has a
nearly identical counterpart in the dm package with its
decompose_table()
. Unjoin is the same as
tidyr::nest()
but returns two tables rather than splitting
one into the rows of other.
The unjoin
is a bit out of place here, but it’s a key
step when building these models, used to remove duplication at various
levels. It’s the primary mechanism for defining and building-in
topology, which is precisely the relationships between entities in a
model. This function is published in the CRAN package
unjoin.
The common “well-known” formats of encoding geometry (WKB/WKT for binary/text) represent (pre-)aggregated data, yet the input levels of aggregation are often not directly relevant to desired or desirable levels of aggregation for analysis. A key stage in many GIS analyses is thus an initial disaggregation to some kind of atomic form followed by re-aggregation.
We propose a common form for spatial data that is inherently disaggregated, that allows for maximally-efficient on-demand re-aggregation (arbitrarily re-composable hierarchies), and that covers the complexity of geometric and topological types widely used in data science and modelling. We provide tools in R for more general representations of spatial primitives and the intermediate forms required for translation and analytical tasks. These forms are conceptually independent of R itself and are readily implemented with standard tabular data structures.
There is not one single normal form that should always be used. There is one universal form that every other model may be expressed in, but also other forms that are better suited or more efficient for certain domains. We show that conversion between these forms is more straightforward and extensible than from SF or related types, but is also readily translated to and from standard types. The most important forms we have identified are “universal” (edges and nodes), “2D primitives” (triangles), “arcs” (shared boundaries), and “paths” (normalized forms of SF types).
# Install the development version from GitHub:
# install.packages("devtools")
::install_github("hypertidy/silicate") devtools
Convert a known external model to a silicate model.
library(silicate)
#>
#> Attaching package: 'silicate'
#> The following object is masked from 'package:stats':
#>
#> filter
<- SC(minimal_mesh) ## convert simple features to universal form
x
<- ARC(minimal_mesh) ## convert simple features to "arc-node" form y
Obtain the elements of a known model type.
sc_vertex(x)
#> # A tibble: 14 × 3
#> x_ y_ vertex_
#> <dbl> <dbl> <chr>
#> 1 0 0 n88USC
#> 2 0 1 tQxSvg
#> 3 0.2 0.2 zpYYIG
#> 4 0.2 0.4 1qRN7h
#> 5 0.3 0.6 3FN2ty
#> 6 0.5 0.2 HInlqb
#> 7 0.5 0.4 KHLNO5
#> 8 0.5 0.7 gqkZ6j
#> 9 0.69 0 yJGoeK
#> 10 0.75 1 raggYW
#> 11 0.8 0.6 xekZyd
#> 12 1 0.8 XBwKMi
#> 13 1.1 0.63 9fpRi6
#> 14 1.23 0.3 0mxEDK
sc_edge(x)
#> # A tibble: 15 × 4
#> .vx0 .vx1 path_ edge_
#> <chr> <chr> <int> <chr>
#> 1 n88USC tQxSvg 1 mHkjqV
#> 2 tQxSvg raggYW 1 knBf33
#> 3 raggYW XBwKMi 1 NQVyd6
#> 4 gqkZ6j XBwKMi 1 avIufa
#> 5 gqkZ6j xekZyd 1 TCSls3
#> 6 yJGoeK xekZyd 1 AAMn9v
#> 7 n88USC yJGoeK 1 CwiGwr
#> 8 zpYYIG HInlqb 2 naREkh
#> 9 HInlqb KHLNO5 2 TptXFI
#> 10 3FN2ty KHLNO5 2 3nlZMA
#> 11 1qRN7h 3FN2ty 2 eXVs6C
#> 12 zpYYIG 1qRN7h 2 AwOvSP
#> 13 xekZyd 9fpRi6 3 gIpLK6
#> 14 9fpRi6 0mxEDK 3 OnFWZr
#> 15 yJGoeK 0mxEDK 3 q1WBzB
sc_node(y)
#> # A tibble: 2 × 1
#> vertex_
#> <chr>
#> 1 yUZl0a
#> 2 Z0xCkI
sc_arc(y)
#> # A tibble: 4 × 2
#> arc_ ncoords_
#> <chr> <int>
#> 1 bpKiyx 4
#> 2 oSxSsU 5
#> 3 q3ZSRx 7
#> 4 yUVgqO 2
There are two kinds of models, primitive and sequential.
Primitive-based models are composed of atomic elements that may be worked with arbitrarily, by identity and grouping alone.
Sequential-based models are bound to ordering and contextual
assumptions. We provide the PATH
and ARC
models as generic, relational forms that provide a convenient
intermediate between external forms and primitives models. Further
intermediate models exist, including monotone and convex decompositions
of polygons.
There is one universal primitives-based model, an edge-only model with two tables at its core. Higher level structures are described by grouping tables, with as many levels as required. Any other model can be expressed in this form.
We also differentiate structural primitives, which are specializations that are more convenient or more efficient in certain cases. These include triangulations (2D primitives), and segment structures (1D primitives), and could provide higher dimensional forms (3D primitives, etc. ).
Currently, we provide support for the universal model
SC
, the sequential models PATH
(simple
features belongs here, amongst many others) and ARC
(arc-node topology, TopoJSON-like, OpenStreetMap), and structural
primitives TRI
.
In practice a segment model is trivial to generate, “SEG” but we
haven’t done that. This would be analogous to the format used by
rgl::rgl.lines
or spatstat::psp
.
We take care to allow for labelling (identity) of component elements, without taking full responsibility for maintaining them. Random IDs are created as needed, but any operation that works with existing IDs should be stable with them.
The spacebucket (arbitrary multi-layer polygonal overlays) and sphier (generic hierarchies from atomic forms) show two different approaches to the problem of hierarchical data and flexible representations.
The key difference between the silicate approach and simple features is the separation of geometry and topology. This allows for normalization (de-duplication) of the entities that are present or that can be identitied. Simple features has no capacity to de-duplicate or otherwise identify vertices, edges, paths or arcs, though tools that work with simple features do construct these schemes routinely in order to perform operations. When these richer, topological structures are built they are usually then discarded and the vertices are again de-normalized and again expressed explicitly without recording any of the relationships. In this sense, simple features can be described as an explicitly-stored PATH analogue, and is no different from the model used by shapefiles, binary blobs in databases, and many other spatial vector formats. There are a number of notable exceptions to this including TopoJSON, Eonfusion, PostGIS, QGIS geometry generators, Fledermaus, Mapbox, WebGL, Threejs, D3, AFrame, Lavavu but unfortunately there’s no overall scheme that can unify these richer structures.
The silicate family is composed of a small number of packages that apply the principles here, either to read from path forms or primitive forms. As work continues some of these will be incorporated into the silicate core, when that is possible without requiring heavy external dependencies.
Looking for a music reference? I always am: Child’s Play, by Carcass.
Please note that the ‘silicate’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.