transitiontrees draws every static plot in pure
ggplot2 – no extra plotting dependency – plus one optional
interactive renderer (visNetwork). Every plot returns a
standard object you can theme, save, or further modify. This vignette
tours them all and reads each one.
A shared convention across the static tree styles: node size = context count, node fill = the most-recent state of the pathway, and edge thickness = the volume of sequences flowing down that branch.
Every plot below is drawn from the same fitted, pruned tree on the
bundled trajectories data (138 learners, three engagement
states). We fit it once here and reuse it throughout.
library(transitiontrees)
data(trajectories)
set.seed(1)
tree <- context_tree(trajectories, max_depth = 3L, min_count = 5L)
pruned <- prune_tree(tree, criterion = "G2", alpha = 0.05)
pruned
#> <transitiontrees> 18 nodes, depth <= 3, 3 states [pruned]
#> alphabet : Active, Average, Disengaged
#> fit on : 136 sequences, 1870 observations
#> smoothing: floor(ymin=0.001, rule=interpolate) min_count = 5
#> pruned by: G2 alpha = 0.05
#> (start) n=1870 -> Average (0.43)
#> |-- Active n=658 -> Active (0.70)
#> | |-- Active n=433 -> Active (0.79)
#> | | `-- Average n=70 -> Active (0.53)
#> | `-- Average n=144 -> Active (0.50)
#> | `-- Disengaged n=12 -> Average (0.83)
#> |-- Average n=751 -> Average (0.61)
#> | |-- Active n=160 -> Average (0.52)
#> | | `-- Disengaged n=10 -> Average (0.50)
#> | |-- Average n=419 -> Average (0.68)
#> | | `-- Active n=80 -> Average (0.57)
#> | `-- Disengaged n=122 -> Average (0.52)
#> | `-- Disengaged n=31 -> Average (0.71)
#> `-- Disengaged n=325 -> Disengaged (0.48)
#> |-- Active n=23 -> Active (0.39)
#> |-- Average n=134 -> Average (0.50)
#> | `-- Active n=17 -> Active (0.41)
#> `-- Disengaged n=139 -> Disengaged (0.68)Root on the left, depth rightward; every leaf is labelled with its full arrow-form pathway and the predicted next state. This is the style for a paper when you need to cite specific pathways inline.
point_size_range and edge_size_range
exaggerate or compress the size dynamic range – useful for slides where
the count contrast must read from the back of the room. The encodings
are unchanged; only the scales differ.
The same tree wrapped into a circle: the eye goes to the thick central branches (the corpus highways) versus the thin outer twigs (contexts pruning kept on evidence, not volume).
A space-filling partition: arc angular width is proportional to count, so a dominant state visually swallows the ring – an honest depiction of class imbalance.
A fourth style, style = "interactive", renders the same
tree as a draggable, zoomable visNetwork widget (collapse
the dominant spine and the rare informative branches become legible). It
produces an HTML widget rather than a static figure, so it is best run
in an interactive session rather than shown inline here.
These complement the tree by ranking pathways rather than drawing topology.
Each row is a context, each column a next state, each cell
P(next | context), modal cell bold; a >
prefix marks a context whose modal next state flips versus its shorter
parent. Sorting the same data two ways is the single
best “common vs informative” figure:
Sorted by count the bright cells stack on the most frequent next state; sorted by divergence they move off it. That lateral shift is the thesis in one comparison.
Per-context KL from the shorter parent, ranked, with orange points
marking modal-flip contexts – the histories that genuinely
change the prediction. min_count removes
small-sample mirages.
plot_pruning() walks a pathway’s suffix chain – the full
context, then the same context with its oldest move dropped, down to the
root – and marks which contexts the pruning test keeps (solid) versus
drops (faded). It answers, for that one pathway, how far back history
actually has to reach.
plot_predictive() scores sequences against the fitted
tree three ways. For this tour we score the bundled
trajectories themselves; in a real evaluation pass
genuinely held-out sequences (the Advanced analysis vignette
shows the cross-validated route).
type = "logloss" – per-position surprise in bits against
position; below the uniform ceiling is structure the model
exploited:
type = "ecdf" – the distribution of the probability
assigned to the state that actually occurred; steep steps reveal
calibration plateaus (e.g. a mass of three-way-open branch points):
A third type, type = "position", traces each individual
sequence’s confidence move-by-move (one grey line per sequence). It is a
per-sequence view that only reads cleanly for a handful of sequences, so
it is omitted here; reach for it when you want to inspect a few specific
trajectories rather than the corpus as a whole.
The context tree reads backward; plot_trajectories()
draws the same sequences forward in time. Colour by
frequency (how many sequences walk each path) or by
predictability (P(state | history) from
the model). Read together they separate traffic from predictability – a
wide-but-pale edge is a high-traffic decision point.
Forward trajectories show their structure best on a richer alphabet,
so this section uses the bundled ai_long log (eight
AI-prompting move types) rather than the three-state engagement data
above.
data(ai_long)
tree_ai <- context_tree(ai_long, actor = "project", session = "session_id",
action = "code", max_depth = 3L, min_count = 10L)
pruned_ai <- prune_tree(tree_ai)Each pathway’s 95% bootstrap interval on G-squared against the chi-square critical value (dashed line); colour encodes the trust quadrant. A bar entirely to the right is reproducibly informative.
boot <- bootstrap_pathways(pruned, iter = 100L, seed = 1L)
plot(boot)
#> `height` was translated to `width`.We name an external group column (Achiever) on the
bundled group_regulation_long log;
context_tree(group = ) fits one tree per cohort, and
compare_trees() consumes the group directly.
data(group_regulation_long)
grp_reg <- context_tree(group_regulation_long,
actor = "Actor", time = "Time", action = "Action",
group = "Achiever", max_depth = 2L, min_count = 10L)
cmp <- compare_trees(prune_tree(grp_reg), iter = 199L, seed = 1L)
plot(cmp)The observed distance (orange line) sits in the right tail of the label-shuffled null (grey) – the visual form of the permutation p-value.
A flat-then-rising perplexity curve is the picture of a short-memory process; the orange star marks the cross-validated winner.
plot_difference() draws the per-context residual map for
the same group =-fitted tree – where two cohorts resolve
the same history toward different next states. depth = 1L
keeps the map to the single-state contexts so the rows stay legible (a
deep tree has too many contexts to label).
| Goal | Function |
|---|---|
| The tree | plot(style = c("horizontal", "dendrogram", "icicle", "interactive"))
(interactive = visNetwork widget) |
| Rank pathways | plot_pathways(), plot_divergence(),
plot_distributions() |
| Memory of one pathway | plot_pruning() |
| Held-out quality | plot_predictive(type = c("logloss", "ecdf", "position")) |
| Forward trajectories | plot_trajectories(measure = c("frequency", "predictability")) |
| Reliability | plot(<bootstrap>),
plot_pathway_resamples() |
| Comparison | plot(<comparison>),
plot_difference() |
| Tuning | plot(<tune>) |