This blog https://nim-lang.org/blog/2017/05/25/faster-command-line-tools-in-nim.html proved that we can write a tool in nim quickly to sum up numbers in TSV file. And the nim-code runs fastest than python, D.
However, in actual world, the situation is more complex than just doing sum. In my latest work, I have to treat many EXCEL xls and xlsx files suppiled by other departments. So, I have to do merge, group, query, pivot_table, drop NAN, fill NAN, read/write XLS/XLSX file.
Since I am familiar with Python than C and nim( ok, to be frankly, I often like to say I know nothing in C and nim). I search and find that only Pandas fits my situation. The second possible soution maybe R with some packages. I did not find an easy way to finish my task in C/C++.
Is there any possiblity to do what Pandas does in nim? Thanks
Hallo,
Take a look at https://github.com/bluenote10/NimData
"NimData is inspired by frameworks like Pandas/Spark/Flink/Thrill, and sits between the Pandas and the Spark/Flink/Thrill side. Similar to Pandas, NimData is currently non-distributed, but shares the type-safe, lazy API of Spark/Flink/Thrill."
AFAIK, NimData handles just CSV. No XLS/XSLX at this time. Anyway, I guess https://github.com/shawnye/xlsx2csv can help you out.
Cheers
Remember that pandas is not just a commandline tool. It was started in 2008 in an already strong ecosystem (Numpy, Scipy ...) by someone working full-time on it for his daily work.
My suggestion is, if you're familiar with pandas, and it's the best tool for your job, use it! There is no need to have one tool/language/framework to rule them all in my opinion.
Now if you have time, you are more than welcome to contribute (it can be code, documentation, tests, examples). Several people see the potential in Nim for scientific and numerical computing and expressed interest in that ecosystem so you're not alone in that. (Disclaimer: I am building a Numpy/Torch like library in Nim)
A bit off Nim topic, but in general for efficient pandas operations, you should never use for loops, and use provided pandas methods or the "apply / applymap" functions if you want to apply functions element-wise. "Groupby + transform" (aka Split-Apply-Combine) is really powerful as well for getting statistics (mean, median std dev, ...) or custom functions on a subset of your data.
Are you sure you are exporting to DOCX and not XLSX? I had a good experience with xlsxwriter for that.
If that still isn't enough maybe take a look at Numexpr and lastly Dask. Their dataframes have the same interface as pandas, can be converted to and from but all operations create a computational graph which will be optimized and executed in parallel when you call compute. There are a few more data processing speed/memory tricks on my blog here.
Yes, I mean DOCX (not XLSX) because my work is a part of whole project. My tabular data with specialed format will be mixed with materials(text, data, graphics etc) supplied by other people.
DASK does not support read xls/xlsx which I must use.
So sorry to continue discuss here because I found this forum is different so that I can't find a way to send private message
The lib https://libxlsxwriter.github.io sounds very interesting, and it will be a great hand to help me out with my first production job with Nim.
However, can you pls provide some directions of how to call a C function (of libxlsxwriter) within my Nim code ? Is there any wrapper around to do that ?
All the best
As far as I know, there is no nim wrapper for libxlsxwriter. The "Interfacing C and Nim" and "Converting C code to Nim" part on https://github.com/nim-lang/Nim/wiki/Nim-for-C-programmers may be useful.
As for reading XLS, I found http://libxls.sourceforge.net/
For commercial products to read/write XLS/XLSX, I found http://libxl.com/
All the above lib are for C.
Setting up Git repo: https://github.com/jmcnamara/libxlsxwriter Pulling repository Processing nimlibxlsxwriter/include/xlsxwriter.h Processing nimlibxlsxwriter/include/xlsxwriter/workbook.h Processing nimlibxlsxwriter/include/xlsxwriter/worksheet.h Processing nimlibxlsxwriter/include/xlsxwriter/shared_strings.h Processing nimlibxlsxwriter/include/xlsxwriter/common.h Processing nimlibxlsxwriter/include/xlsxwriter/third_party/queue.h Generating nimlibxlsxwriter/queue.nim Processing nimlibxlsxwriter/include/xlsxwriter/third_party/tree.h Generating nimlibxlsxwriter/tree.nim Command failed: 1 cmd /c "c2nim --stdcall --dynlib:dynlibtree --out:nimlibxlsxwriter/tree.nim temp-tree.nim.c" d:Nimlibsystemfatal.nim(39) sysFatal Error: unhandled exception: assignment to discriminant changes object branch; compile with -d:nimOldCaseObjects for a transition period [FieldError]
stack trace: (most recent call last)
Any clue?
Looks like this package broke after a deprecation in Nim 0.20.0.
case object branch transitions via system.reset are deprecated. Compile your code with -d:nimOldCaseObjects for a transition period.
So the best short term "fix" would be to clone that repo, and build it locally using nim c -d:nimOldCaseObjects ...