Hey!
As many of you will be aware by now, I started to write a port of ggplot2 some time mid last year:
https://github.com/Vindaar/ggplotnim
After many sometimes frantic sessions working on this, I'm finally approaching a first personal milestone: Essentially all features I consider essential for a plotting library (for my personal use cases!) are (or are about to be) implemented. This will mark the release of version v0.3.0.
The remaining features I will implement in the next few days are:
Now, the main reason I open this topic is to ask all of you about what I should focus on once the above is done.
There are several ideas I have in my mind, but definitely not the time to tackle them at the same time. They are:
One of the main goals I had in mind when starting this whole project was to provide two different plotting backends. One native target to produce plots locally, fast and statically.
On the other hand, originally inspired by @mratsim's monocle, a Vega-Lite backend to scratch that interactive / web based itch, which allows for easy sharing of plots including data!
I wrote a proof of concept and by now I have a pretty good idea (barring a lack of Vega experience) on how to implement this.
Essentially the whole processing of the plot as is done now remains the same. This allows to make use of the whole functionality of ggplotnim without having to do a lot of duplication. The drawing code will be replaced by a mapping to JSON instead.
The major work would be involved defining said mapping. If I'm lucky I can even write it as a ginger backend with a - for Vega pretty obscure - API (drawPoint, drawLine, etc. essentially just adding data to a JsonNode). More likely it'll involve replacing the drawing portion of ggplotnim Vega related drawing equivalents.
The included data frame in ggplotnim is - for many operations anyways - abysmally slow.
While performance is nice, I mainly wanted something to work with "right now" instead of spending a lot of time writing a performant data frame.
The reasons for the performance are three-fold, as far as I can tell:
One thing to improve performance would be to include the distinction between a pure column of one data type and Value columns (which are somewhat similar to object types in numpy / pandas if my superficial understanding of those is correct).
While I'm not certain, I believe that distinction alone would make the code a lot more complex and would definitely require a lot of use of generics. Generics is something I specifically wanted to avoid in context of a data frame, because each time I played around with toy data frames this became a headache.
The only idea to avoid generics would be to extend a Value to also have a case for vector like data, similar to JsonNodes JArray. That would double the number of fields though.
In any case, if I were to seriously attempt to improve performance of the data frames, I would stop messing around myself and first do some research into how data frames are handled elsewhere.
Again, if anyone is familiar with resources, feel free to share them!
This is pretty self explanatory. The main documentation is definitely lacking as it is right now.
I hope that the recipes provide anyone of you who tried to play around with the library with a reasonable alternative for the time being!
There's a lot of functionality in ggplot2 that makes it a proper R package. Namely a lot of stats related functionality. Simple things like box and violing plots, smoothing and error bands and probably a lot more I'm not even aware of, since I don't really use that stuff.
If that's something people want, I could defnitely work on that.
At least box and violing plots and simple loess smoothing is something I'll implement at some point anyways. If there's something else you consider essential, let me know.
This is a fun idea I had a while back. As far as I'm aware our poor fellows who are stuck working with C and C++ don't really have a great plotting library to work with.
Some of them are very powerful but hard to use, some produce not very nice looking plots and some others (looking at you ROOT) bring along an oil tanker of dependencies.
Maybe I'm missing something, but as far as I can tell there are many people who do their calculations in C/C++, dump the data and use python for plotting.
I'm not sure if people would be interested in such a thing, but Nim being awesome would allow for a shared library to essentially make use of all of ggplotnims functionality from C. I wrote a small scatter plot function and it worked perfectly.
Maybe this wouldn't work out as well as I think right now, but it feels like maybe a great opportunity for Nim to shine.
Currently dates and times are not supported in ggplotnim. I'm being made aware of this every day at the moment, thanks to many COVID-19 plots I see daily, which have dates on their x axis…
This is definitely a must have, but I haven't really thought about how I want to implement this.
Once the time comes for me to write my thesis, I will probably want more power over the type setting on the plots, especially to choose arbitrary fonts and set equations etc.
One main feature matplotlib still has over ggplotnim for me is to easily put LaTeX onto plots.
This is something I will attempt to implement at some point this year. I don't yet know how to best do it, but I have a few ideas.
I could go crazy and write a tikz backend for ginger I suppose. I'm not familiar enough with tikz to know how flexible it is, but it seems doable.
Or I could split the non text based plot and the text based stuff into two outputs, dump the text into a LaTeX template, compile it and merge the two files.
I'll figure something out.
If you have any other ideas or maybe I'm missing something important, feel free to let me know. Either post it here, or open an issue on the repository.
Sorry for rambling so long. ;)
In general I want to encourage anyone who tries out ggplotnim to feel free to open issues on the repository freely. Please don't think it's strictly for bug reports. If you struggle using the library chances are I'm at fault. Either the documentation sucks, you're using it in ways I didn't foresee, which may thus be cumbersome, etc. I'll try to help out as best as I can!
Thanks for reading!
I'll use this thread to update you on releases and post changelogs in the future.
This is awesome! I have not really used nim for signal processing or data science work yet, but I think it would work very well with the right libraries, and this is one of the libraries that could make that happen.
Is this compatible with other libraries, such as arraymancer, etc? I think that one of the biggest strengths of the python numerical ecosystem is the good inter-operability of most plotting libraries with numpy. So if that is not already the case I would suggest making that your highest priority.
Other than that, I didn't see mention of support for contour plots in the docs. It is surprising how often those come in handy in many scenarios so I'd like for you to add that if it is not available yet. Another thing I like to do is to combine line plots with histograms and/or kernel density plots on the X and Y axis (to get a quick idea of the distribution of the values, particularly in time series). It would be neat to support for that too.
Finally, in signal processing work you are often working with complex samples. In that context it is often handy to plot the I and Q components vs time, placing 2 subplots on top of each other, and _linking the X (time) zoom/pan of the two subplots. It would be really nice if that were supported.
@Vindaar ggplotnim is a very nice library!
Is this compatible with other libraries, such as arraymancer, etc? I think that one of the biggest strengths of the python numerical ecosystem is the good inter-operability of most plotting libraries with numpy. So if that is not already the case I would suggest making that your highest priority.
I second that and using a common DataFrame. numpy is the foundation of Python data libraries with pandas but we haven't such a recognized data foundation with nim. Having a coordinated effort would benefit many scientific nim projects.
See the Data Science section for other DataFrame libraries.
Is this compatible with other libraries, such as arraymancer, etc? I think that one of the biggest strengths of the python numerical ecosystem is the good inter-operability of most plotting libraries with numpy. So if that is not already the case I would suggest making that your highest priority.
The answer to that is "sort of". I'll need to explain a little to answer the why and what I mean by "sort of".
The long answer
Originally when I started the library I never planned to write a data frame library to go with this. I quickly realized however that (at least with a library like ggplot2) one doesn't work well without the other. In a normal plotting library every plotting function is a special case. Essentially each kind of plot wants data in a specific form / of a specific data type.
So in the beginning I specifically didn't want to use arraymancer internally. I love that library, but given that all I wanted to write was a "plotting library", this meant two things specifically for me:
For this reason I decided to avoid having arraymancer as a dependency, because all its strengths are mostly useless for the intended purpose, but would mean I introduce an unnecessary dependency.
If a user is using arraymancer for calculations, it's easy to convert the required data to ggplotnim's data types. I felt the overhead of copying the data was not a big deal under the assumption mentioned above.
But, things did somewhat change when I started to write the data frame.
My first idea was actually to use NimData, since I really like library. However, the (depending on viewpoint) advantage / disadvantage that their type is entirely defined via a schema at compile time, didn't appeal to me. I didn't want to end up with a ggplot2 clone that was super restrictive, because everything had to be known at compile time.
I was actually hoping that @bluenote would pick up his development of Kadro again:
https://github.com/bluenote10/kadro
That sounded perfectly suited. But since he didn't, I simply started to hack together something that suits the needs of the library.
Originally in fact the DataFrame type was generic and my goal was to write the code in such a way that the underlying type does not matter. This made things complicated though. In fact I even thought about an arraymancer backend from the start:
https://github.com/Vindaar/ggplotnim/blob/master/playground/arraymancer_backend.nim
which however never progressed from there. Mainly because I couldn't figure out how to make use of arraymancer's performance, when majority of data frame operations I did ended up copying around data. Which is how I ended up with @PMunch's persistent vector from Clojure. It kind of allowed me to "copy as much as I want" without the performance penalty.
This is how we got to the current situation. The data frame is okayish fast for simple things to prepare a plot. Anything else, I can't recommend it (also because it's extremely lenient on types!).
tl;dr
Compatibility with the "rest of the ecosystem" isn't there for practical reasons.
The thing is I'd love to profit from @mratsim's amazing work on arraymancer and laser!
Once I go back and reconsider performance of the data frame, I hope I will end up using as much of arraymancer as I can to be honest. I just need to figure out how to do it. :)
Other than that, I didn't see mention of support for contour plots in the docs. It is surprising how often those come in handy in many scenarios so I'd like for you to add that if it is not available yet. Another thing I like to do is to combine line plots with histograms and/or kernel density plots on the X and Y axis (to get a quick idea of the distribution of the values, particularly in time series). It would be neat to support for that too.
Good point. Contour plots are something I simply didn't think about.
I've never actually thought about how those are implemented before. I guess it's just a 2 dimensional KDE, right?
Since I will be implementing geom_density, for which I need KDEs anyways, I might as well implement N dimensional KDEs. That makes performance an even bigger issue though.
This is a case where implementing this in arraymancer would defintely be helpful and then just pass the required data from a DF to arraymancer. Maybe in a few months time, we can just write:
import ggplotnim, arraymaner
let df: DataFrame = someDataFrame()
let dfKde = df.kde("x", "y", "z")
where kde would be an arraymancer proc; or something along those lines…
Finally, in signal processing work you are often working with complex samples. In that context it is often handy to plot the I and Q components vs time, placing 2 subplots on top of each other, and linking the X (time) zoom/pan of the two subplots. It would be really nice if that were supported.
When you say "on top of each other". Do you mean essentially a plot with a secondary axis? These are already supported, but are somewhat limited right now.
There's no recipe of these at the moment though. An example:
https://gist.github.com/Vindaar/5292d4d9b8fb667e3eb27061627dbbfe#gistcomment-3225761
The downside of secondary axes at the moment is that it is just a fake axis. It is ticks and labels on the RHS of the plot, but the underlying data is still drawn into the coordinate system defined by the main axis.
I know that ggplot2 explicitly does not allow completely independent axes (only those which can be calculated from one another, e.g. unit conversions), because Hadley Wickham thinks other cases are easily misleading. And to an extent I agree. However, I do think there is a place for them, so I will provide better support for them in the future.
real subplots
Or do you mean a normal subplot consisting of several (in principle not connected) plots in a single graphic? Those are also supported, but their use is not perfectly nice yet. One has to make use of ginger functionality directly.
An example inspired by: https://staff.fnwi.uva.nl/r.vandenboomgaard/SP20162017/SystemsSignals/plottingsignals.html
import ggplotnim, seqmath, math, sequtils, complex, ginger
let t = linspace(-0.02, 0.05, 1000)
let y1 = t.mapIt(exp(im(2'f64) * Pi * 50 * it).re)
let y2 = t.mapIt(exp(im(2'f64) * Pi * 50 * it).im)
let df = seqsToDf({ "t" : t,
"Re x(t)" : y1,
"Im x(t)" : y2 })
let plt1 = ggcreate(
ggplot(df, aes("t", "Re x(t)")) +
geom_line() +
xlim(-0.02, 0.05) +
ggtitle("Real part of x(t)=e^{j 100 π t}"),
width = 800, height = 300
)
let plt2 = ggcreate(
ggplot(df, aes("t", "Im x(t)")) +
geom_line() +
xlim(-0.02, 0.05) +
ggtitle("Imaginary part of x(t)=e^{j 100 π t}"),
width = 800, height = 300
)
# combine both into a single viewport to draw as one image
var plt = initViewport(wImg = 800, hImg = 600)#wImg = 800.0, hImg = 800)
plt.layout(1, rows = 2)
# embed the finished plots into the the new viewport
plt.embedAt(0, plt1.view)
plt.embedAt(1, plt2.view)
plt.draw("real_imag_subplot.pdf")
Which produces the following plot: https://gist.github.com/Vindaar/5292d4d9b8fb667e3eb27061627dbbfe#gistcomment-3225762
Another example can be found here: https://gist.github.com/Vindaar/fc158afbc75627260aed90264398e473
If you have something else in mind, let me know!
Thank you for your detailed explanation of the current situation and how you got there. I hope you can find a way to improve the interoperability story of this promising library.
Regarding the performance of the library I’d say that to me the most important requirement is to be able to plot huge amounts of data. Ideally, hundreds of thousands, even millions of points should not be a problem and should not take more than a few seconds to plot (the less time the better of course :-) ). Being able to update the plot a few times per second (with a reasonably small dataset and simple plot) would be nice too (but not need to achieve tens of FPS or something like that IMHO).
As for the “complex signal” plot, I was referring to a plot with two separate subplots (place one above the other) where the Y axis zoom is independent but the X-axis zoom and pan are linked. The idea is that if I want to view the real component in the range of the 3000th to the 5000th sample I’ll also like to view the imaginary component in the same range.
Ideally I’d like to be able to simply pass a complex number data frame to the plotting functions and the library would just know what to do with it (i.e. the creation of the 2 subplots would be done implicitly and automatically).
I sometimes need to plot the real and imaginary components of a complex signal on same plot, so supporting that would also be nice but I don’t need it as often.
Would https://github.com/Araq/packedjson or the ideas it uses be of help for a common ground for data processing?
With ARC the data doesn't have to copied between threads and without ARC the copy is a single allocation + copyMem, quite cheap.
So I did a thing today… (which is why I haven't answered yet).
This morning I took another look at a rewrite of the DataFrame using an arraymancer backend. Turns out by rethinking a bunch of things and especially the current implementation of the FormulaNode, I managed to come up with a seemingly working solution.
This is super WIP and I've only implemented mutate, transmute and select so far, but first results are promising.
Essentially the FormulaNode from before is now compiled into a closure, which returns a full column.
So the following formula:
f{"xSquared" ~ "x" * "x"}
will assume that each string is a column of a data frame and create the following closure:
proc(df: DataFrame): Column =
var
colx_47075074 = toTensor(df["x"], float)
colx_47075075 = toTensor(df["x"], float)
res_47075076 = newTensor[float](df.len)
for idx in 0 ..< df.len:
[]=(res_47075076, idx, colx_47075075[idx] * colx_47075074[idx])
result = toColumn res_47075076
The data types for the columns and the result data type are currently based on heuristics given things that appear in the formula. E.g. if math operators appear it's float, if boolean operators it's bool etc.
The data frame now looks like:
DataFrame* = object
len*: int
data*: Table[string, Column]
case kind: DataFrameKind
of dfGrouped:
# a grouped data frame stores the keys of the groups and maps them to
# a set of the categories
groupMap: OrderedTable[string, HashSet[Value]]
else: discard
where a Column is:
Column* = object
case kind*: ColKind
of colFloat: fCol*: Tensor[float]
of colInt: iCol*: Tensor[int]
of colBool: bCol*: Tensor[bool]
of colString: sCol*: Tensor[string]
of colObject: oCol*: Tensor[Value]
colObject is the fallback for columns, which contain more than one data type.
So I only wrote a super simple for loop to get a rough idea how fast/slow this might be:
import arraymancer_backend
import seqmath, sequtils, times
#import ggplotnim # for comparison with current implementation
proc main(df: DataFrame, num: int) =
let t0 = cpuTime()
for i in 0 ..< num:
df = df.mutate(f{"xSquared" ~ "x" * "x"})
let t1 = cpuTime()
echo "Took ", t1 - t0, " for ", num, " iter"
proc rawTensor(df: DataFrame, num: int) =
var t = newTensor[float](df.len)
let xT = df["x"].toTensor(float)
let t0 = cpuTime()
for i in 0 ..< num:
for j in 0 ..< df.len:
t[j] = xT[j] * xT[j]
let t1 = cpuTime()
echo "Took ", t1 - t0, " for ", num, " iter"
when isMainModule:
const num = 1_000_000
let x = linspace(0.0, 2.0, 1000)
let y = x.mapIt(0.12 + it * it * 0.3 + 2.2 * it * it * it)
var df = seqsToDf(x, y)
main(df)
rawTensor(df)
Gives us: new DF:
raw arraymancer tensor:
While the old DF took 23.3 seconds for only 100,000 iterations! So about a factor 23 slower than the new code.
Probably really bad comparison with pandas:
import numpy as np
import pandas as pd
x = np.linspace(0.0, 2.0, 1000)
y = (0.12 + x * x * 0.3 + 2.2 * x * x * x)
df = pd.DataFrame({"x" : x, "y" : y})
def call():
t0 = time.time()
num = 100000
for i in range(num):
df.assign(xSquared = df["x"] * df["x"])
t1 = time.time()
print("Took ", (t1 - t0), " for 1,000,000 iterations")
call()
Took 60.24467134475708 for 100,000 iterations I suppose using assign and accessing the columns like this is probably super inefficient in pandas?
And a (also not very good) comparison with NimData
import nimdata
import seqmath, sequtils, times, sugar
proc main =
let x = linspace(0.0, 2.0, 1000)
let y = x.mapIt(0.12 + it * it * 0.3 + 2.2 * it * it * it)
var df = DF.fromSeq(zip(x, y))
df.take(5).show()
echo df.count()
const num = 1_000_000
let t0 = cpuTime()
for i in 0 ..< num:
df = df.map(x => (x[0], x[0] * x[0])).cache()
let t1 = cpuTime()
echo "Took ", t1 - t0, " for ", num, " iter"
when isMainModule:
main()
Took 16.322826325 for 1,000,000 iter
I'm definitely not saying the new code is faster than NimData or pandas, but it's defintely promising!
I'll see where this takes me. I think though I managed to implement the main things I was worried about. The rest should just be tedious work.
Will keep you all posted.
Some simple benchmarks comparing the new backend to pandas at:
https://github.com/Vindaar/ggplotnim/tree/arraymancerBackend/benchmarks/pandas_compare
Note that I ran the code on a default pandas installation on my void linux, without blas. But I also compiled the Nim code without blas support.
It's just a port of a pandas / numpy comparison from here:
https://github.com/mm-mansour/Fast-Pandas
All in all the new backend (let's call it datamancer from now on, heh) is significantly faster for all operations, which essentially just rely on @mratim's work.
For a few others, specifically unique and sorting, it's slightly slower. But given the implementation of those I'm actually rather happy with that.
And especially for small data frame sizes the function call / looping overhead python has to bear is ridiculous.
I'll focus on finishing up the open PR (ridgelines and a bit more) and then finish this.
just copy-paste or submodule into you code, Laser is more of a research repo right now.
Compatibility with Arraymancer will be much easier once the following PR is merged, and Arraymancer can use raw buffers coming from any libraries (including Numpy) without copy: https://github.com/mratsim/Arraymancer/pull/420
It's unfortunately blocked by a Nim bug in the GC.
Thanks, maybe I'll give it a try to include it manually into the repo!
improve performance and usability on complex apply/map
It will definitely help, but I'm already creating a single loop for each formula, no matter how many tensors are involved.
E.g.
let df = ...# some DF w/ cols A, B, C, D
df.mutate(f{"Foo" ~ `A` * `B` - `C` / `D`})
will already be rewritten to:
var
col0_47816020 = toTensor(df["A"], float)
col1_47816021 = toTensor(df["B"], float)
col2_47816022 = toTensor(df["C"], float)
col3_47816023 = toTensor(df["D"], float)
res_47816024 = newTensor[float](df.len)
for idx in 0 ..< df.len:
[]=(res_47816024, idx, col0_47816020[idx] * col1_47816021[idx] - col2_47816022[idx] /
col3_47816023[idx])
result = toColumn res_47816024)
which is indeed a little slower than a manual map_inline, but still pretty fast. Compare the first plot from here:
https://github.com/Vindaar/ggplotnim/tree/arraymancerBackend/benchmarks/pandas_compare
Not sure where the variations map_line sees are coming from though. Effects of openmp?
Small aside about the types
The data types are determined as floats from the usage of *, / etc. Could be overridden by giving type hints:
f{int -> float: ...}
^--- type of involved tensors
^---- type of resulting tensor
AFAIK it should would allow combining complex transformations and do them in a single pass instead of allocating many intermediate dataframes so performance can be an order of magnitude faster on zip/map/filter chains.
While this is certainly exciting to think about, I think it'd be pretty hard to (for me in the near future anyways) achieve while:
But this is just me speculating from the not all that simple code of zero-functional. I guess having a custom operator like it does would allow us to replace the user given proc names though.
If you have a better idea of how to do efficient chaining that seems reasonable to implement, I'm all ears.
what I'm working on
Right now I'm rather worrying about having decent performance for group_by and inner_join though. I'm looking at https://h2oai.github.io/db-benchmark/ since yesterday. It's a rather brutal reality check, hehe.
Comparing my current code with the first of the 0.5 GB group_by examples to pandas and data.table was eye opening. In my current implementation of summarize for grouped data frames I actually return the sub data frames for each group and apply a simple reduce operation based on the users formula. Well, what a surprise, that's slow. I haven't dug deep into data.table of pandas yet, but as far as I can tell they essentially special case group_by + other operation and handle these by just aggregating on all groups in a single pass.
So I've implemented the same and even for a single key with a single sum I'm 2 times slower than running the code with pandas on my machine. To be fair, performing operations on sub groups individually is a nice 100x slower than pandas.
Still, the biggest performance impact I have to make is in order to allow columns with multiple data types to group by. I need some way to check which subgroup a row belongs to. Since I can't create a tuple at runtime, in order to just use normal comparison operators I decided to calculate a hash for each row and compare that. That works well, but gives me that 2x speed penalty.
For the time being though, I think I'm happy with that unless I have a better idea / someone can point me to something that works in a typed language and doesn't involve huge amount of boilerplate code.
So I'm currently working on an implementation that allows to use user defined formulas for aggregation while not having to call a closure for each row.
Ok, so I just merged the arraymancer backend PR, which includes the PR for version v0.2.0.
v0.2.0 was mainly ridgeline plots and scale_*_reverse. Note that due to there is currently no recipe for a ridgeline plot. That will be added in the next few days. Also they are not as nice as they should be (essentially the top ridge doesn't change its height depending on the max values in the ridge if overflowing of ridges into one another is allowed).
scale_*_reverse just allows to reverse scales as the name suggests.
Aside from that a few smaller things were added (theme_void) and a few recipes that use geom_tile (annotated heatmap and plotting the periodic table).
I'm not entirely happy with the state of version v0.3.0 though, since the formula mechanism introduces several breaking changes. Arguably reading formulas is now clearer (see the beginning of the README and especially the recipes, since they all have to be compliant with the new mechanism!), but it still requires code to be changed.
I think the amount of breakage is probably not that large, since not that many people will have used formulas for things anyways yet. Also because the DF was discouraged before, since it was slow.
Simple formulas e.g. f{"hwy"} remains unchanged anyways, same as f{5} to set some constant value to an aesthetic. But for these things formulas were previously only required for numbers and not referring to columns, since the aes proc took string | FormulaNode. Now also numbers are actually supported, so to set some constant value, you can just do aes(width = 0.5) instead of aes(width = f{0.5}).
In any case, I wanted to get this PR off my chest, since it was way too large. I tried to avoid breaking changes as much as possibly by macro magic, but this issue:
https://github.com/nim-lang/Nim/issues/13913
was the nail in the coffin. So I just release it now.
Feel free to open issues, because I broke your code. :)
I'm happy to say that facet_wrap is finally back with version v0.3.5.
Normal classification by a (in this case 2) discrete variable(s):
Classification by discrete variable with free scales:
See the code for these two here: https://github.com/Vindaar/ggplotnim/blob/master/recipes.org#facet-wrap-for-simple-grid-of-subplots
Other notable changes of the last few versions include:
See the full changelog for all recent changes:
https://github.com/Vindaar/ggplotnim/blob/master/changelog.org
Sorry about that. When I started writing this I had no idea cairo would be such a pain on Windows.
There's an issue about it here: https://github.com/Vindaar/ggplotnim/issues/57
I haven't updated the README yet, mostly because I don't have a good solution either yet. The easiest for me on a practical level was to just install emacs and add it to my PATH (which is I guess equivalent to you using the Inkscape libraries).
I guess I can think about either adding working versions of the required libraries to the repository for windows (at least win64) or a script which clones the cairo repository and builds it locally. I haven't built cairo locally yet, so I don't know if it works well.
Now regarding your actual question. If you want to ship a program, which uses ggplotnim internally, you have to do what people do on Windows as far as I know: bundle all required DLLs with the program.
The other alternative would be a static build of cairo. I'll see what I can do to improve the situation. Thanks for the input!
I wasn't aware of the GR framework. I certainly looks interesting. However, it does not look more light weight than cairo. Just having Qt as a dependency is an immediate no-go to me. At least for a default backend (unless I'm missing something and you can easily get both binaries w/o Qt dependency and build it w/o it).
Also it obviously does a lot more than cairo. It's a full fledged visualization library.
For ggplotnim's purposes the only advantage it would have would be access to more backends, as far as I can see.
Adding a new backend to ginger is in principle as easy as providing these procs:
https://github.com/Vindaar/ginger/blob/master/src/ginger/backendDummy.nim
And see the actual cairo backend:
https://github.com/Vindaar/ginger/blob/master/src/ginger/backendCairo.nim
So feel free to add a new GR backend to ginger if you'd like!
To me the most important features I want from backends are:
I can totally see how GR can be a great library to build a powerful visualization library, if being used from the onset. It seems to take care of a lot of annoying details I had to get right.