vignettes/thereAndBackAgain.Rmd
thereAndBackAgain.Rmd
Consider the following artificially generated dataset:
data <- data.frame(A = c(19, 19, 25, 62, 34,
58, 62, 40, 24, 60,
70, 40, 40, 34, 26),
B = c(68, 63, 63, 13, 55,
78, 14, 14, NA, 28,
NA, 55, 57, 40, 78) )
There are 15 observations in the dataset; variable A
is
complete (has no missing values), whereas variables B
and
C
are missing 2 and 0 observations, respectively.
ggplot
to loon
transformation
ggplot
and its states
Begin with a ggplot
defined using the interactive
grammar extension:
ggp <- ggplot(data,
mapping = aes(x = A, y = B)) +
ggtitle("Some title") +
geom_point(color = "grey", size = 5) +
linking(linkingGroup = "my plots")
plot(ggp)
Information on the ggplot
corresponding to
loon
states is had using ggplot_build()
# get the ggplot data corresponding to loon "states"
ggp_states <- ggplot_build(ggp)$data[[1]]
ggp_states
#> x y PANEL group shape colour size fill alpha stroke
#> 1 19 68 1 -1 19 grey 5 NA NA 0.5
#> 2 19 63 1 -1 19 grey 5 NA NA 0.5
#> 3 25 63 1 -1 19 grey 5 NA NA 0.5
#> 4 62 13 1 -1 19 grey 5 NA NA 0.5
#> 5 34 55 1 -1 19 grey 5 NA NA 0.5
#> 6 58 78 1 -1 19 grey 5 NA NA 0.5
#> 7 62 14 1 -1 19 grey 5 NA NA 0.5
#> 8 40 14 1 -1 19 grey 5 NA NA 0.5
#> 9 24 NA 1 -1 19 grey 5 NA NA 0.5
#> 10 60 28 1 -1 19 grey 5 NA NA 0.5
#> 11 70 NA 1 -1 19 grey 5 NA NA 0.5
#> 12 40 55 1 -1 19 grey 5 NA NA 0.5
#> 13 40 57 1 -1 19 grey 5 NA NA 0.5
#> 14 34 40 1 -1 19 grey 5 NA NA 0.5
#> 15 26 78 1 -1 19 grey 5 NA NA 0.5
Note that some of the x
and y
values are
NA
, representing missing values.
loon
plot and its states
Now construct the loon
plot from this
ggplot
:
lp <- loon.ggplot(ggp)
plot(lp)
The loon
plot looks slightly different:
loon
plot.Plots in loon
have more constrained layouts than those
in ggplot2
, since focus is primarily on interactive as
opposed to publication quality graphics.
Some differences (e.g., colour) are also because loon
is
based on Tcl
for drawing primitives.
Plot states in loon
are accessible using []
and the name of the state. Values are assigned in the same way. Names of
interactively changeable states in loon
are
names(lp)
#> [1] "glyph" "itemLabel" "showItemLabels" "linkingGroup"
#> [5] "linkingKey" "zoomX" "zoomY" "panX"
#> [9] "panY" "deltaX" "deltaY" "xlabel"
#> [13] "ylabel" "title" "showLabels" "showScales"
#> [17] "swapAxes" "showGuides" "background" "foreground"
#> [21] "guidesBackground" "guidelines" "minimumMargins" "labelMargins"
#> [25] "scalesMargins" "x" "y" "xTemp"
#> [29] "yTemp" "color" "selected" "active"
#> [33] "size" "tag" "useLoonInspector" "selectBy"
#> [37] "selectionLogic"
# and accessed with [] as in
lp["title"]
#> [1] "Some title"
(See also ?l_info_states
.)
Some, like x
, y
, and color
,
are n-dimensional states whose values correspond to those on
individual observations. These may be changed using the
[] <-
notation as well.
Because loon
plots exclude observations missing values
in any its n-dimensional states, the loon
plot will have
fewer observations than the corresponding ggplot
, even
though the same number of points may be plotted. The actual number of
observations in the loon
plot is
lp["n"]
#> [1] 13
which is less than the number of observations recorded in the
ggplot
structure (namely nrow(ggp_states) =
15).
Differences between ggplot
structure and
loon
plot structure (and how these affect the
visualizations) is worth exploring in a little more detail. A better
understanding will be helpful when transforming back and forth between
the two.
In R
the point symbol plotted is associated with a
numerical code pch
. In loon
the point symbol
is a glyph, identified by a string.
ggp_states$shape
#> [1] 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
lp["glyph"]
#> [1] "circle" "circle" "circle" "circle" "circle" "circle" "circle" "circle"
#> [9] "circle" "circle" "circle" "circle" "circle"
As with other plot “states”, there is not always a glyph
in loon
that matches a pch
in R
.
Values of pch
with no counterpart in loon
will
map to the default glyph
in loon
.
Size is another plot state that differs between ggplot
and loon
.
ggp_states$size
#> [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
lp["size"]
#> [1] 26.78116 26.78116 26.78116 26.78116 26.78116 26.78116 26.78116 26.78116
#> [9] 26.78116 26.78116 26.78116 26.78116 26.78116
The point size
in loon
identifies the
approximate area of the point symbol, so when
transforming from ggplot
to loon
, the
size
parameter of the ggplot
is transformed to
an integer value in loon
that tries to match the area of
the point in ggplot
.
Comparing the plots, the points are slightly different
"grey"
in one plot than in the other.
This is because the two plots have different values stored as their colour state:
ggp_states$colour
#> [1] "grey" "grey" "grey" "grey" "grey" "grey" "grey" "grey" "grey" "grey"
#> [11] "grey" "grey" "grey" "grey" "grey"
lp["color"]
#> [1] "#808080808080" "#808080808080" "#808080808080" "#808080808080"
#> [5] "#808080808080" "#808080808080" "#808080808080" "#808080808080"
#> [9] "#808080808080" "#808080808080" "#808080808080" "#808080808080"
#> [13] "#808080808080"
First, there is a slight difference between what R
regards as "grey"
and what Tcl
(used by
loon
) regards as "grey"
. Several other
named colours in R
also differ slightly
from those in Tcl
and hence in loon
.
R
colours whose name is the same as that
in Tcl
but whose colour is different are
only the following:
(See ?tkcolors
in loon
for more
information.)
Second, loon
plot colour values are strings of
twelve hexadecimal digits, as in tk
(e.g.,
see ?tkcolors
); in contrast, the ggplot
are
either strings corresponding to the named R
colours (see
?grDevices::colors
in R
) or to
six hexadecimal digits (two for each of the red, green,
and blue components).
Twelve hex digit colours are turned into six hex digits using the
loon
function hex12to6()
. To convert the
R
named colours to six hex digits, the following function
can be used.
tohex <- function(x) {
sapply(x, function(xi) {
crgb <- as.vector(col2rgb(xi))
rgb(crgb[1], crgb[2], crgb[3], maxColorValue = 255)
})}
So, comparing the two "grey"
colours, the
ggplot
has the R
hex colour
tohex("grey") =
#BEBEBE and the loon
plot has
Tcl
hex colour (converted to hex 6)
hex12tohex6(lp["color"][1]) =
#808080, which are clearly
different RGB values.
The loon
plot does not include the missing data
nrow(ggp_states) == lp["n"]
#> [1] FALSE
# Compare
ggp_states$y
#> [1] 68 63 63 13 55 78 14 14 NA 28 NA 55 57 40 78
lp["y"]
#> [1] 68 63 63 13 55 78 14 14 28 55 57 40 78
Note that, absent the missing NA
s, the
order of the points is identical.
Default linking keys in loon
are from "0"
to "n-1"
where "n"
is the number of rows in
the original data set.
Here nrow(data) =
15. Note that this "n"
is
not the number of points actually plotted because loon
drops the missing data, unlike ggplot
.
The linking keys reflect this missingness:
lp["linkingKey"]
#> [1] "0" "1" "2" "3" "4" "5" "6" "7" "9" "11" "12" "13" "14"
which is missing "8"
and "10"
. Note also
that lp["n"] =
13 is the number of points plotted in the
loon
plot.
Some care needs to be taken when dealing with linking, especially when there is missing data.
See the vignette Linking
and the loon
vignette Logical
Queries for more information.
loon
to ggplot
transformation
A ggplot
from the loon
plot will not have
access to the original information.
ggp_lp_1 <- loon.ggplot(lp)
ggp_lp_1
Note that, though a ggplot
, the position of the title is
now centred. This is because the objective of the transformation is to
create a ggplot
that looks as much like the original
loon
plot as possible.
ggplot
states
To try to reflect the look of the loon
plot, the states
of the ggplot
are different from those of
the original ggplot
ggp
.
ggp_lp_1_states <- ggplot_build(ggp_lp_1)$data[[1]]
ggp_lp_1_states
#> colour size x y PANEL group shape fill alpha stroke
#> 1 #808080 5 19 68 1 1 19 NA NA 0.5
#> 2 #808080 5 19 63 1 1 19 NA NA 0.5
#> 3 #808080 5 25 63 1 1 19 NA NA 0.5
#> 4 #808080 5 62 13 1 1 19 NA NA 0.5
#> 5 #808080 5 34 55 1 1 19 NA NA 0.5
#> 6 #808080 5 58 78 1 1 19 NA NA 0.5
#> 7 #808080 5 62 14 1 1 19 NA NA 0.5
#> 8 #808080 5 40 14 1 1 19 NA NA 0.5
#> 9 #808080 5 60 28 1 1 19 NA NA 0.5
#> 10 #808080 5 40 55 1 1 19 NA NA 0.5
#> 11 #808080 5 40 57 1 1 19 NA NA 0.5
#> 12 #808080 5 34 40 1 1 19 NA NA 0.5
#> 13 #808080 5 26 78 1 1 19 NA NA 0.5
lp_ggp_lp_1 <- loon.ggplot(ggp_lp_1)
Perform some interactive changes, including selecting some points
selection <- lp["x"] > 50 &lp["y"] > 13
lp["selected"] <- selection
colorMeRed <- lp["x"] == 34
lp["color"][colorMeRed] <- "red"
loon.ggplot()
Linking problems can arise whenever a ggplot
constructed
from a loon
plot is then made interactive again.
Suppose a loon
plot, like lp
, is turned
into a ggplot
(typically, after some interactive changes)
via loon.ggplot()
, and the resulting new
ggplot
is then itself turned into another interactive plot.
The new interactive plot will not necessarily share the same linking
information as the original.
This is because the second interactive plot will have the default
values of linkingGroup
, linkingKey
and linked
display states; these values are lost in the transfer from the first
interactive plot to the ggplot
. That is,
the linking group is not automatically carried over to the second interactive plot
the linked states are not automatically carried over to the second interactive plot
the default linking keys in the second interactive plot
are "0"
, …, "n-1"
where n
is the
number of observations that were displayed in the first plot.
So, provided the first interactive plot has the default linking keys (and was built with complete data; i.e., no data was missing at creation), then the linking keys will match.
An important special case is when some observations
were selected in the first interactive plot at the time
the ggplot
is created. In this case, the linking
keys will typically not match.
For example,
# Get a ggplot from the loon plot
ggp_lp <- loon.ggplot(lp)
The interactive lp
and the new ggplot
ggp_lp
appear as follows.
The left plot, lp
, is an interactive loon
plot
and the magenta points are selected. The right plot,
ggp_lp
, is a static ggplot
which has no
selected
state and the magenta points are simply points
having that colour (and appear as that in the ggplot
legend).
In an interactive plot, selected points are visually emphasized in two ways:
When transferring that to a ggplot
, the selected points
appear with the highlight colour and the data
order is changed in the ggplot
so that they appear
on top of all other points in the display. If, instead, the argument
selectedOnTop = FALSE
is given to
loon.ggplot()
call, then the order of points will not be
changed in the ggplot
. This is strongly
recommended whenever the ggplot
will later be
turned into an interactive plot; this will allow the linking to match
more easily with the original interactive plot.
To see the effect of this (and of missing values) on linking, suppose
an interactive plot is created from the ggplot
ggp_lp
:
# The loon plot from the resulting ggplot
lp_ggp_lp <- loon.ggplot(ggp_lp)
Compare the linking keys of this plot with the original:
# The original loon plot has linking keys
lp["linkingKey"]
#> [1] "0" "1" "2" "3" "4" "5" "6" "7" "9" "11" "12" "13" "14"
# And the loon plot from the derived ggplot
lp_ggp_lp["linkingKey"]
#> [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
Clearly, these are not the same and will match observations correctly
("0"
to "7"
), some incorrectly
("9"
, "11"
, and "12"
), and some
not at all ("8"
, "13"
, and
"14"
).
The difference is that lp
was created with data missing
values corresponding to the missing linking keys "8"
and
"10"
. This information was not available on the
ggplot
ggp_lp
so that the new
loon
plot would be created with the default values
"0"
to "12"
. The result is that some points
will be wrongly linked between lp
and
lp_ggp_lp
.
The selected points in lp
(highlighted magenta) will
cause reordering in ggp_lp
and hence in
lp_ggp_lp
. This too will cause problems in linking and
other states of the new lp_ggp_lp
. For example, the two
point orders can be seen in the respective values of their
"x"
:
# The original point order
lp["x"]
#> [1] 19 19 25 62 34 58 62 40 60 40 40 34 26
# The new plot's order
lp_ggp_lp["x"]
#> [1] 19 19 25 62 34 40 40 40 34 26 58 62 60
This makes problems for matching the correct observations.
Other states are also changed because of the transition from
loon
plot to ggplot
to loon
plot.
In particular, the "color"
and "selected"
states will not match.
# Original selected
lp["selected"]
#> [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
#> [13] FALSE
# the new plot has nothing selected
lp_ggp_lp["selected"]
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE
This is for two reasons. First the selected points of lp
changed to colour in ggp_lp
and so the selected points from
lp
could not be transferred to lp_ggp_lp
. The
colours in lp_ggp_lp
will not match those of. Second, the
new plot has not (yet) joined the same linking group as the
original.
# Original selected
lp["color"]
#> [1] "#808080808080" "#808080808080" "#808080808080" "#808080808080"
#> [5] "#FFFF00000000" "#808080808080" "#808080808080" "#808080808080"
#> [9] "#808080808080" "#808080808080" "#808080808080" "#FFFF00000000"
#> [13] "#808080808080"
# the new plot has nothing selected
lp_ggp_lp["color"]
#> [1] "#808080808080" "#808080808080" "#808080808080" "#808080808080"
#> [5] "#FFFF00000000" "#808080808080" "#808080808080" "#808080808080"
#> [9] "#FFFF00000000" "#808080808080" "#FFFF0000FFFF" "#FFFF0000FFFF"
#> [13] "#FFFF0000FFFF"
When observations are also highlighted in the
original loon
plot, as in lp
, a little more
care needs to be taken with respect to the linking keys. The problem is
that in constructing the ggplot
, the data has to be
reordered to ensure that the selected highlighted points appear on top
of the other points in the ggplot
.
Care needs to be taken to manage the linking keys when moving from
the static ggplot
to the interactive loon
plot. This information can be added when the interactive plot is
created, as arguments to loon.ggplot()
as follows.
# Add the linking information when creating the interactive plot
lp_ggp_l1_lk <- loon.ggplot(ggp_lp,
linkingKey =lp["linkingKey"],
linkingGroup = "NA example")
# Now compare
lp_ggp_l1_lk["linkingKey"]
#> [1] "0" "1" "2" "3" "4" "5" "6" "7" "9" "11" "12" "13" "14"
# to the original loon plot
lp["linkingKey"]
#> [1] "0" "1" "2" "3" "4" "5" "6" "7" "9" "11" "12" "13" "14"
Alternatively, the grammar could have been used as in
# Add the linking information when creating the interactive plot
lp_ggp_l1_ggk <- loon.ggplot(ggp_lp +
linking(linkingGroup = "NA example",
linkingKey =lp["linkingKey"]))
# Again compare
lp_ggp_l1_ggk["linkingKey"]
#> [1] "0" "1" "2" "3" "4" "5" "6" "7" "9" "11" "12" "13" "14"
# to the original loon plot
lp["linkingKey"]
#> [1] "0" "1" "2" "3" "4" "5" "6" "7" "9" "11" "12" "13" "14"
Of course, all this depends on the user knowing where to find the correct linking information.
Comparing the three different interactive plots shows how information
can be lost, as well as how it can be maintained, when translating from
interactive to static ggplot
back to an interactive
loon
plot.