Hey everyone. I have been reading up on Nim for a couple days and am very interested in it. I'm a python dev doing mostly analytics (and if I'm lucky some actual data science). Pandas is my workhorse, and having dataframes is pretty crucial to my being able to use a language.
The only pandas-esque package I could find is nimdata and it seems to be abandoned. Is there an active package out there doing similar things? Or another way to get at that kind of functionality I'm not seeing?
Thanks for your help!! Very much looking forward to moving some of my work into a more performant language.
It's not abandoned, @bluenote10 replies very quickly if you open an issue.
Unfortunately tackling Pandas is a very large ordeal.
If you like tinkering, the best would probably to wrap Nvidia CuDF C++ API at https://github.com/rapidsai/cudf using nimterop or the techniques used in NimTorch.
If however you want something usable right now, you need to use both Nim and Python at the same time using nimpy and or some Jupyter magic:
Might also be worth looking at Apache Arrow:
https://wesmckinney.com/blog/apache-arrow-pandas-internals/
I don't think such a pandas-thing existing in any static language
First of all, there is no xlrd/openpyxl/etc equivalent which can enable reading data into a DataFrame.
Then, can the data type of a column value be infered by nim-pandas-like-thing? I suspected that we have to pre-define the dtype of every column before we can read the data.
First of all, there is no xlrd/openpyxl/etc equivalent which can enable reading data into a DataFrame.
That dosen't make any sense , any programming langauge can be used to write excel reader. Your mentioned lib only read XLS files.
Then, can the data type of a column value be infered by nim-pandas-like-thing? I suspected that we have to pre-define the dtype of every column before we can read the data.
We already have panda-like lib for nim :
https://github.com/bluenote10/NimData
and numpy-like
Then, can the data type of a column value be infered by nim-pandas-like-thing? I suspected that we have to pre-define the dtype of every column before we can read the data.
Data type can be inferred and Datamancer does that, for another example, see:
https://pietroppeter.github.io/nimib/penguins.html
where some columns are inferred as numeric values (int, float), some as string, some as generic "object" (which is a variant type that can contain any of the above and also bool).
note: the example predate the independent birth of Datamancer, which before being its own package, it started inside ggplotnim.
I agree that it would be nice if more people would join Nim.
As far as I can see those libraries do have proper maintainers (people who answer issues and can direct contributors), they likely could benefit more contributors (and users).
First of all, there is no xlrd/openpyxl/etc equivalent which can enable reading data into a DataFrame.
That dosen't make any sense , any programming langauge can be used to write excel reader. Your mentioned lib only read XLS files.
what I mentioned is exactly libraries which can READ XLS/XLSX, then the pandas-like DataFrame library can use the libraries to get/read data At least for me, I have to read both XLS and XLSX into pandas for advanced processing for 5 years till now.
Reading XLS / XLSX is no different from reading CSV files from the point of view of a dataframe library. Both require runtime based type determination of the different columns (or require the user to give a static schema at compile time).
For example it wouldn't be hard to use https://github.com/xflywind/xlsx to read an XLSX file and then convert it to a Datamancer dataframe. It would be less efficient than parsing from CSV, but that's not for any technical reason (I simply spent some time to make the CSV parser relatively fast).
I have experirece wirting my own parser for XLS and XLSX before xlrd was born (back in 2008) in python. It is not difficult in python , same for Nim too. Now we have xlrd and we could port it to nim (pure nim by rewriting in nim) , and would be a lot higher performance , static binary .
If anyone would like to fund me on doing it , i could take that task.