nimforum mirror - Proposal to start a Nim-Scientific Community

5242 :: Proposal to start a Nim-Scientific Community

[2019-09-29T10:57:32+02:00]

View Orginal

hugogranstrom (orginal) [2019-09-29T10:57:32+02:00] view original

There seem to be quite a few data/computational science people lurking around here in the Nim community (@mratsim among others). Is there any interest in us creating a place where such stuff can be discussed? I think it would be helpful to have a dedicated place to discuss such matters, especially for people coming from Python (the main data science language nowadays) as they might (as I did) see Nim as a "faster Python". And having a dedicated place to discuss is less intimidating and then getting into the Nim IRC/Gitter/Discord, especially if you just check it a few times a day and miss most of the discussions.

Is there anyone who would be interested? If so, what platform should we use? (Gitter/IRC, subreddit etc)

Stefan_Salewski (orginal) [2019-09-29T11:19:25+02:00] view original

Why not just start on Nim forum and Nim IRC and switch to separate area when number of participants becomes too large, like maybe a few dozen?

I really wonder why some people like to separate so much? We all know that Nim community is still small, and there is no evidence that many people are really working in data/computational science. And data science discussion on Nim forum is interesting for many and it is good advertising for Nim.

For other topics it is similar -- for example I got some queries to create a Nim-GTK forum, Channel, IRC. Last one recently, see https://github.com/StefanSalewski/gintro/issues/54#issuecomment-532497567 Well, there would be one user if I subscribe myself, and maybe one more a few times a month.

hugogranstrom (orginal) [2019-09-29T11:32:59+02:00] view original

I get your point ;) The forum works fine, but it's the IRC part that I'm a bit bothered by. I usual check the Gitter once every 2 hours and during that time there has been too much going on (usually) and I don't have the time to go through it all. If there has been anything about data science, it has most probably been drained in the core devs discussions or people asking more general Nim questions (which is really good :-D don't get me wrong, I love that part of the Nim community and its engagement).

I'm also starting this thread to see IF there are people interested ;) if not, I would see no use in a separate community

Stefan_Salewski (orginal) [2019-09-29T11:57:17+02:00] view original

I get your point too :-)

Currently in IRC is much discussion due to 1.0 release. But generally traffic is lower. I look at IRC logs sometimes, and I have never seen much discussion about data science. Most is compiler dev, bugs, game dev, crypto. And some noise.

My feeling is, when someone starts a discussion and there is someone other interesting in that, than a longer discussion follows. zcharter does this often, for his game engines. So you may just try to start a data science discussion when there is no important other discussion on IRC and see if there are other people.

But maybe some other people will join this thread, so we will see. I am interested in science too (working on rtree bulk loading and k nearest neigthbor search just now), but I will never join a separate channel.

sschwarzer (orginal) [2019-09-30T10:35:24+02:00] view original

I'm currently not using Nim for scientific computing, but I'm quite interested in what's going on in this area and follow related discussions in the forum.

bobd (orginal) [2019-09-30T15:32:50+02:00] view original

I'm working on stats/data science code (e.g. a Nim interface for R) and will open source everything when it's in a usable state (probably early next year). But in the meantime it would be useful to have somewhere to coordinate with other people working in this area - primarily to avoid re-inventing the wheel. For example, I've seen that user cdome is working on a DataTable - something that's on my own to-do list (this is as good a time as any to ping you cdome for more info!). MeWe is one possibility I guess.

cdome (orginal) [2019-09-30T19:29:31+02:00] view original

Hi, +1 to discuss scientific Nim in this forum until community is going to grow big enough. I am making a number of scientific calculations with Nim but from stochastic differential equations and Monte Carlo simulations for finance and insurance angle rather than neural networks and data mining.

DataTable package was my pilot project to get more familiar with Nim. IMO, to finish it woulr require a couple of improvements in Nim itself, namely: chaining of iterators and static[T] type improvements.

mratsim (orginal) [2019-10-01T00:04:37+02:00] view original

I think we can replace iterator chaining by objects that represent the transformations.

Then it can be applied lazily like Dask (aka build a compute graph) or applied like D-ranges, a bit like @timotheecour PoC. AFAIK the new C++20 ranges work like this as well.

minierolls (orginal) [2019-10-01T04:08:16+02:00] view original

Would like to contribute here too! I am currently working on a (un)weighted FSA/FSM library (which can be easily used to implement HMMs), which might be useful for any NLP/Speech folks. Highly interested in keeping up with the progress in the community; is anyone working on a generic ndarray/matrix computation library?

mratsim (orginal) [2019-10-01T08:31:48+02:00] view original

I do, in Arraymancer: https://github.com/mratsim/Arraymancer/

For NLP there was some wrappers here: https://github.com/Nim-NLP with a focus on NLP on the Chinese language.

For FSM for NLP, I've came across BlingFire from Microsoft research but I guess the most flexible tokenizer is sentencepiece by Google which does unsupervised training and does not assume anything about the language (whitespaces), you can just give it things to read.

hugogranstrom (orginal) [2019-10-01T18:22:18+02:00] view original

Couldn't agree more! A plotting library that can both do a simple

plot(x, y)

And more advanced customization.

hugogranstrom (orginal) [2019-10-01T18:25:31+02:00] view original

Are there any maintained tries at a Jupiter kernel one could try to help out?

The path through Python seems like a good one. When Python programmers realize there is a better Cython, things will get fun here ;)

hugogranstrom (orginal) [2019-10-01T18:27:44+02:00] view original

Monte Carlo involves generating a lot of random numbers, right? And doing the same thing to all of them? Do you incorporate any parallelization to speed it up?

chemist69 (orginal) [2019-10-01T19:46:23+02:00] view original

Well, there is of course nimpy for creating bindings to Python, and there is a %%cython-like %%nim magic for Jupyter Python notebooks, building on the former. An experimental Jupyter kernel also exists: jupyternim.

mratsim (orginal) [2019-10-02T01:44:04+02:00] view original

@chemist69 Jupyternim predates hot-code reloading which was written also with jupyter kernel in mind and should be less hacky. No idea though on how to use it in practice.

Docs if someone want to play with it: https://nim-lang.org/docs/hcr.html

@miran In terms of plotting we have nim-plotly, ggplotnim which is written from scratch.

On my side I'm still convinced that the Vega ecosystem is probably one of the best way forward. Especially because they provide an open-source Tableau called Lyra (build with feedback from Tableau people) and most impressively a tool that does automatic suggestions of data visualizations called Voyager

This is the video that sold me on Vega from the OpenVis 2015 conference. Focus on Voyager at 19:15 - https://youtu.be/GdoDLuPe-Wg?t=1155.

I have a PoC of calling Vega lite from Nim here: https://github.com/numforge/monocle but I have no time to work on it for the foreseeable future.

Allin (orginal) [2019-10-02T15:50:37+02:00] view original

Something else to put on the radar:

"Pyodide brings the Python runtime to the browser via WebAssembly, along with the Python scientific stack including NumPy, Pandas, Matplotlib, parts of SciPy, and NetworkX"

<https://github.com/iodide-project/pyodide>

Maybe a nim language plugin in the future?

<https://iodide-project.github.io/docs/language_plugins/>

And webassembly is not limited to browsers (although browser is the new black):

<https://github.com/intel/wasm-micro-runtime/issues/85> <https://github.com/CraneStation/wasmtime>

hugogranstrom (orginal) [2019-12-13T19:48:08+01:00] view original

What kind of floats should you use for scientific libraries? Right now I have a mix of float and float64 but what is the preferred way to do it? One way I've been thinking about is using generics with SomeFloat to accomodate all kinds of floats. Or is float64 the way to go nowadays?

cblake (orginal) [2019-12-13T22:55:52+01:00] view original

In Nim float should be the same as float64. float32 is smaller which means less memory bandwidth needed as well as wider vector instructions in modern CPUs (e.g., on AVX you can fit 8 float32 in one register but only 4 float64). Some neural net people want to use a couple variants of 16-bit (or even 8-bit) floating point formats, while for certain very high precision cases 128-bit is becoming less rare.

So, I don't think there is some perfect choice of size/accuracy and so your SomeFloat idea sounds smarter to me. You might also benefit from having "iterative" numerical routines accept some kind of "precision" parameter that defaults to something (probably dependent upon the concrete floating point type in play), but allows users to maybe trade off speed & accuracy themselves. I.e., if you have some series approximation with an error bound then allow users to pass the max tolerable error (either in absolute terms or relative terms). Nim's named parameter passing with defaults should make it easy to have a common convention in your library.

hugogranstrom (orginal) [2019-12-14T12:14:24+01:00] view original

That sound like a good idea :-) Thank you!

Mirror of forum.nim-lang.org

5242 :: Proposal to start a Nim-Scientific Community