nimforum mirror - Numerical libraries for data science

haghighat (orginal) [2023-01-17T12:03:40+01:00] view original

Hi everyone

I am mainly working on data science with Python. It is a tradition that nowadays people implement everything in other languages for speed reasons and wrap them in python to get simple code and other utilities of python.

I was looking for good languages for this. C++ with Pybind11 is an option that I am using now. It produces python packages after compilation but it needs to deal with all issues of C++ and coding and development is not as fast and scalable as as I like.

I am new to Nim and could almost learn its basics over a night.

My question is that if

Nim would be a good option to produce fast python packages?

I need to use a lot of multiprocessing to parallelize things. Is this quite smooth in Nim? or is it crazy as in C++? or almost impossible due to GIL as in

Python.

I also learned Rust and it produces packages in python. The only problem is that simple things are complicated in Rust and in short-run it's not worth

it.

I would be really grateful if I know your suggestions on this.

Thanks.

UxDnz0 (orginal) [2023-01-17T13:49:02+01:00] view original

Yeah I mean I think this is exactly where nim shines as opposed to Rust, as you pick out; Nim keeps the simple things simple and the hard things possible, Rust instead is trying to market a lego brick with 3.46 dimensions.

Nimpy is operatively intuitive, fast, 2-way, etc. So use that for 98% of what you're asking.

If you care for optimizations take a look at some of Treeform's work with Guzba, especially on the SIMD side.. I've never seen optimizations be so mature(as opposed to pre-mature), aka not deeply dug into the fractational dimensional Lego house that you are trying to keep stable in int:=3 dimensional space.

Parallel is relatively plenty mature in this ecosystem, the tricky bit is that while nim keeps easy things easy, difficult things stay difficult, although crucially, and hopefully up to your wits; Macros should make this a temporary phenomenon until you have refined your spec and DSL.

xigoi (orginal) [2023-01-17T14:17:56+01:00] view original

Why use Nim to make packages for Python when you can make packages for Nim?

Clonk (orginal) [2023-01-17T14:47:34+01:00] view original

Nim would be a good option to produce fast python packages?

Not really. While it is possible - there are package like nimporter who can export Nim code to Python - it's overly complicated for little benefit. If you start using Nim, it's best to work entirely in Nim and use C++ library through FFI for what's missing.

I need to use a lot of multiprocessing to parallelize things. Is this quite smooth in Nim? or is it crazy as in C++? or almost impossible due to GIL as in Python.

Multicore processing is relatively okay. The threading is something that should be improved in 2.0, there are work in progress related to that (see ßtd/tasks for example).

Otherwise, it's relatively easy to use openMP. Parallelism is handled for you in Arraymancer, you can also use Weave or https://github.com/status-im/nim-taskpools manipulating parallisation without having to rely on a thread object.

I also learned Rust and it produces packages in python. The only problem is that simple things are complicated in Rust and in short-run it's not worth it.

The main benefit of Python is easier-to-use API compared to C++ and Rust. Nim doesn't need Python to produce nice API, mostly du to its metaprogramming features and clean syntax. Small program stay simple but the language can keep on scaling even on bigger project.

I would start with taking a look at Arraymancer to see what already exists

giaco (orginal) [2023-01-17T15:38:46+01:00] view original

I do use nim with python daily.

On the nim side, you have all the scinim stuff

Then use nimpy to creative a native .so/.dll python module

Import your native module in python and enjoy full speed

juancarlospaco (orginal) [2023-01-17T20:26:16+01:00] view original

https://github.com/juancarlospaco/cpython tries to be useful for Nim/Python

haghighat (orginal) [2023-01-17T22:35:51+01:00] view original

haha I like that. The only things is that nowadays in data science everything happens under Python cover so this is very important to keep the interface python.

haghighat (orginal) [2023-01-17T22:37:52+01:00] view original

Thank you so much for the references. I will start taking a look and see if I can find an easy way.

haghighat (orginal) [2023-01-17T22:38:50+01:00] view original

I need to take a look. If it is that simple, it would be great!

auxym (orginal) [2023-01-17T23:57:51+01:00] view original

Have a look at genny to automate python bindings to nim: https://github.com/treeform/genny

And weave for parallel processing: https://github.com/mratsim/weave

AMoura (orginal) [2023-01-24T20:33:59+01:00] view original

Nimporter v2 released soon. https://github.com/Pebaz/nimporter/tree/nimporter-v2.0.0rc

giaco (orginal) [2023-01-25T03:12:35+01:00] view original

I'm using nimpy to export nim functions as python native module, and scinim+arraymancer / numpy as cross-boundary agents to move buffers back and forth.

It works nice. I've been using this to run scientific models since an year'ish

pietroppeter (orginal) [2023-01-25T08:29:34+01:00] view original

Do you have some examples to share?

haghighat (orginal) [2023-01-26T03:03:05+01:00] view original

thanks a lot for your response.

I started playing with Nim and I really like it. There are a variety of tools in data science such as numpy, pandas, torch, jax, etc. where all do part of the job.

I guess the main potential of Nim would be in writing parts of code for better efficiency. I believe Nim can completely replace Cython language in that sense, which is very unrobust. Also, with multi-processing feature, Nim can be super useful.

I was working on a project which I implemented by C++ & Pybind11 but debugging the whole code took more than 1 week. I did it in a couple of hours by Nim. So, clean code of Nim should not be ignored.

I agree about integration with Numpy but I guess it would need some work because in Numpy the underlying data is always in a big consecutive chunk of memory but it uses a View or Stride technique to manage the way this data is indexed.

For example, if data is 1-dim the stride=1 which means by changing the index I move 1 step in the memory. But now assume the same and I put a stride = (1,8), then it means the first index always moved with 1 step bit second index by 8. This can be interpreted as as 2D n x 8 matrix.

haghighat (orginal) [2023-01-26T03:07:06+01:00] view original

Thanks a lot.

I tried nimpy + nimporter and it worked very well for me.

The only thing I was worried was that some time one does not know how data is mapped between Python and Nim. For example, I had a binary signal and used int8 in Nim but what I received on the python side with int64. I guess it is because Python does not have int8, int16, ...

Also, at the moment, the only way to transfer the data between Nim and Python is via list[Python] <-> Seq[Nim], which has its own limitations.

I wish there was a way to manage this better. Then Nim would be really wonderful.

giaco (orginal) [2023-01-26T06:48:46+01:00] view original

that's why you should use arraymancer Tensor on the Nim side, and numpy on the python side.

you can convert the two with scinim and keep the internal data type consistent.

AMoura (orginal) [2023-01-26T09:13:46+01:00] view original

I aggree. Publish lot of Nim library to Python with nimporter/nimpy is a very good, the best for me, to attract developers.

I think that a SciNim Python version that expose some data science libraries can be very good to highlight the Nim language.

haghighat (orginal) [2023-01-26T11:47:10+01:00] view original

Thanks a lot. I need to check this; If it works, it would be perfect.

haghighat (orginal) [2023-01-29T00:06:40+01:00] view original

Hi,

I had a look on Tensor class from scinim, which is very close to Numpy in Python. I tried to use Nimpy with Tensor class and called the exported function with numpy class. I receive memory segmentation error.

Do you have perhaps a code snippet how you export things in Nim side via Nimpy and call then by numpy on python side?

Thnaks.

Mirror of forum.nim-lang.org

9831 :: Numerical libraries for data science