nimforum mirror - a vote for numeric/scientific programming in Nim

gcoles (orginal) [2015-01-06T16:01:31+01:00] view original

I didn't know about Nim until it popped up on Hacker News recently. I think it is a very good language, and compares well with another new language that has gotten a lot of attention, which is Julia.

I think Nim stacks up very well vs Julia for numeric programming, when it comes to language features. Nim also seems more mature.

I would like to encourage the language authors/maintainers to think about taking Nim in the direction of numeric computing, statistics, and scientific computing.

thx

novist (orginal) [2015-01-06T16:03:22+01:00] view original

Isnt all that is just a matter of libs and not language itself?

brianrogoff (orginal) [2015-01-06T18:09:38+01:00] view original

Not just a matter of libs I think, or there would be little interest in Julia. I agree with gcoles that Nim as a language has great potential there. For one thing, scientific programmers generally like overloaded operators and procs, and aren't likely to want to write numerics code in, say, OCaml (which I generally quite like) where you have '+' for ints and '+.' for floats.

There was already a brief discussion here http://forum.nim-lang.org/t/589 so there is certainly interest in this.

Orion (orginal) [2015-01-06T18:52:30+01:00] view original

Seems like scientists like their REPLs and being able to graph things within the REPL

mason_mcgill (orginal) [2015-01-06T19:05:40+01:00] view original

I've been working on a few scientific/numerical libraries in Nim over the past couple months, so I'm glad to hear there's interest in this! I'll make sure to post updates when things become usable enough to share.

brianrogoff (orginal) [2015-01-06T19:45:01+01:00] view original

I'm looking forward to seeing what you come up with Mason. How are you planning on modeling multidimensional arrays in Nim? Nim arrays have compile time bounds as part of their type. seq is a growable vector, which is unnecessary; I think you want a type which is not growable but gets its size at runtime. I guess that you'll use seqs underneath, right?

Orion, you are right, a large number of scientists just want to explore data interactively. So, a Nim repl would help. One reason that R is so popular in spite of being a horrible language is that it's plotting libraries (ggplot2!) are wonderful. Same for MATLAB. Python is making big inroads there because it's a decent language even though dynamically typed, and has extensive libraries. Julia's value proposition is supposed to be that you can have an interactive language that's C fast.

mason_mcgill (orginal) [2015-01-06T20:42:11+01:00] view original

@brianrogoff: I think Julia currently has the best broad-strokes model for general-purpose multidimensional arrays: the array's size is determined at runtime, but its element type and number of dimensions are encoded in its type (and yes, I'm currently using shallow-copied seqs to implement this). My two main additions to this system are

Pervasive, implicit broadcasting (inspired by NumPy). This will make type-safe generic programming possible without requiring tedious manual accounting of how many dimensions every array should have.

Support for overhead-free lazy operations (inspired by D's ranges). Python and MATLAB don't really have this (there's NumExpr, but it seems like more of a DSL than a real, extensible API). Julia sort of does, but if you don't jump through hoops to appease the JIT/inference engine, it smacks you with a 20x slowdown.

I work more with signal processing than linear algebra, so I'm hoping I can provide a general purpose multidimensional grid library that can be the basis for an algebra library if someone's interested in writing that. The situation would be like NumPy's, only tensors and grids would be compatible, since it's much harder to run out of operators in Nim (* could mean elementwise multiplication and ** could mean tensor multiplication, or vice versa).

zio_tom78 (orginal) [2015-01-06T22:17:13+01:00] view original

I come from 20 years of C/C++/IDL/Python programming in the domain of scientific computing, and after having got really bored with them (particularly C++ and IDL) I started looking at alternatives. So far the two languages that seem most interesting are Julia and Nim: the first one looks promising for doing REPL work and short scripts, while Nim might be a good choice for larger programs, where the ability to have static types checked at compile time helps a lot.

As others have already said, one of the missing parts of the language is some more versatility with arrays (statically sized arrays whose size is decided at runtime - after all, Ada has them since 1983!) and ranges (e.g. passing a[4..7] to a function expecting an array reference). I don't think the lack of scientific libraries should be a problem: Nim is one of the languages with the easiest FFI I know. Of all the scientific libraries I use, I have ported only CFITSIO so far (http://forum.nimrod-lang.org/t/678), but the experience has been pretty pleasant. I might start write some bindings to the HDF5 library pretty soon, too: if anybody is interested, I'll report here my progresses.

brianrogoff (orginal) [2015-01-07T07:46:51+01:00] view original

@mason_mcgill Sounds good. Any idea how to make multidimensional array access in Nim look like it does in other languages? For example, here's a rough cut of a seq backed multidimensional array with element type and dimensions encoded in type (this isn't to be used for a real matrix library!)

import macros

type
  Matrix*[T, D] = ref object
    dims: D
    data: seq[T]

proc makeMatrix*[T,D](init:T, dims:D): Matrix[T,D] =
  var ndims = len(dims)
  var count = 1
  if ndims > 0:
    for i in 0 .. <ndims:
      count *= dims[i]
  else:
    count = 0
  
  new(result)
  result.dims = dims
  result.data = newSeq[T](count)
  for i in 0.. < count:
    result.data[i] = init

proc mapIndexRowMajor[D](dims : D, indices : varargs[int]) : int =
  result = 0;
  for i in 0.. < len(indices):
    var prod = 1
    for j in (i+1).. <len(indices):
      prod *= dims[j]
    result += prod * indices[i]

proc mapIndexColumnMajor[D](dims : D, indices : varargs[int]) : int =
  result = 0;
  for i in 0.. < len(indices):
    var prod = 1
    for j in 0.. <i:
      prod *= dims[j]
    result += prod * indices[i]

proc get[T,D](mat : Matrix[T,D], indices : varargs[int]) : T =
  result = mat.data[mapIndexRowMajor(mat.dims,indices)]

proc put[T,D](value : T, mat : var Matrix[T,D], indices : varargs[int]) =
  mat.data[mapIndexRowMajor(mat.dims,indices)] = value

macro `[]`*(g: var Matrix, indicesAndValue: varargs[expr]): expr =
  var indicesTuple = newNimNode(nnkArgList)
  var indices = indicesAndValue
  for child in indices.children:
    indicesTuple.add child
  quote do:
    get(`g`, `indicesTuple`)

macro `[]=`*(g: var Matrix, indicesAndValue: varargs[expr]): stmt =
  let value = indicesAndValue[indicesAndValue.len - 1]
  var indicesTuple = newNimNode(nnkArgList)
  var indices = indicesAndValue
  indices.del(indices.len - 1)
  for child in indices.children:
    indicesTuple.add child
  quote do:
    put(`value`, `g`, `indicesTuple`)

Edit: I fixed embarassingly buggy code and incorporated @mason_mcgill's suggestions. It's devilishly hard to figure out how to do some things from the docs. Thanks Mason!

@zio_tom78 Thanks for porting CFITSIO, that's a nice example to start from. I'd also be interested in seeing your bindings to HDF5. I was thinking that for Nim numerics binding to some good, widely used C (not C++!) libraries, like PETSc, would be a good start. Any opinions?

C++ libraries might be better to port over, but as you said, there are some areas where the Nim language may be improved, though I'm reluctant to suggest them until I'm more fluent and have done some ports.

mason_mcgill (orginal) [2015-01-07T09:09:26+01:00] view original

@brianrogoff: You can define the [] and []= operators for the types you want to index (there's an example or two in the manual). Both of these operators can accept an arbitrary number of arguments, so you can write m[1, 2, 3] as well as m[1][2][3].

brianrogoff (orginal) [2015-01-09T03:09:49+01:00] view original

I still like to introduce the [. .] brackets for generic parameter lists in case of ambiguity but most people are not too fond of the idea.

I like it. There's a bit of resemblance to the pragma notation. I assume we'd only use them when we had to disambiguate the [] in UFCS, right? Who are these "most people" who aren't too fond of this?

com (orginal) [2015-05-07T16:12:50+02:00] view original

Recently started programming in Nim and really enjoy the language. I think Nimrod could really shine in the scientific computing realm. Sure, python does a nice job in this area, but it's often too slow. This means that the programmer has to switch between a simple, easy to use language (such as python) and a more complex one such as C++. Nimrod might be able to offer both ease of programming and speed to those in the scientific computing community. Matrix libraries and/or multidimensional arrays would be a great starting point. I've started writing a matrix library myself, more as a way to learn the features of Nim. We'll see if anything comes of it.

Stefan_Salewski (orginal) [2015-05-07T16:33:54+02:00] view original

Nimrod might be able to offer both ease of programming and speed to those in the scientific computing community.

Of course Julia seems to be really strong in this special area.

com (orginal) [2015-05-07T17:59:47+02:00] view original

One of the nice things about python it that is so versatile. Thanks to the flexibility of the language and all the packages you can do all sorts of progamming tasks along with scientific computing. I don't know enough about Julia, but it seems geared mostly towards scientific computing. I'm guessing that Nim could also have a speed advantage over Julia (if the Nim scientific computing libraries where written well). There have also been a number of criticisms of the Julia benchmarks.

kuba (orginal) [2015-05-07T18:46:23+02:00] view original

+1 for having more idiomatic support for multi dimensional arrays in Nim.

com: There have also been a number of criticisms of the Julia benchmarks.

This is interesting, I though Julia is quite fast. Do you have any specific links in mind, or can you please summarize what are the concerns?

com (orginal) [2015-05-07T19:19:02+02:00] view original

It's not that Julia isn't fast (or fast enough for most things). And there's more to a language than speed. Just that there were some criticisms of their benchmarks. Didn't look into it too much though. People made a point that they didn't optimize the code in other languages in the way someone normally would (e.g. using numpy/scipy in python). I didn't really look into this much, but I just remember that it was brought up. They may have newer benchmarks making such comparisons - it was a while ago that I looked into Julia.

Some talk of it here

https://groups.google.com/forum/#!msg/julia-users/l-KXBX6327M/Cfcc9SDEHnsJ

https://github.com/JuliaLang/julia/issues/2412

There were other sites, but I can't remember what they were.

shaunc (orginal) [2016-07-06T15:29:49+02:00] view original

@zio_tom78 -- you mention you have written some hdf5 bindings. Did you publish them anywhere? I'd be interested in using them.... :)

jlp765 (orginal) [2016-07-07T07:28:41+02:00] view original

+1 for scientific programming in Nim

There are layers of functionality/complexity required (as I see it), based on comparison to Python and R:

vector/matrix algebra

dataframes

groupby of dataframes (aggregated manipulation of dataframes)

easy graphing and display of data

memory-limit agnostic structures

As I see it, the development and provision of libraries is at stage 1 (and is what you are primarily discussing, although Numpy has been mentioned a number of times.) There are quite a few linear algebra libraries currently available in nimble.

To provide the equivalent of Numpy requires (among other things) a dataframe mechanism, which is more than hard-wiring your own seq[]. A dataframe needs to easily handle multiple fields of different types (dates, strings, ints, floats, ....), and easily display the data, .....

I have listed the third option (GroupBy) as distinct to a dataframe, because the result of grouping a dataframe is a datafrsame on steroids (a superset of a dataframe). So I assume dataframes are made up of sequences, and GroupBy thingies are made up of dataframes.


sequence -> dataframe -> groupby

Point 4 would be good if it was easy to port pyplot to Nim (but it is highly Python dependent, IIRC). Someone has already provided a library for accessing gnuPlot, so that might be an option (although its output is not quite as nice)

Point 5 is for handling "large" datasets. Not many people will take it seriously when the program dies because it couldn't fit the data into memory. It needs to seamlessly page data to disk as an option for analysing large datsets. The Spills library does this, but that functionality would need to be included in the dataframe and groupby functionality.

cdunn2001 (orginal) [2016-07-12T17:17:22+02:00] view original

For comparison, this is currently available for Go:

https://github.com/gonum

For example, you can link with BLAS (FORTRAN?), or you can use a pure-Go BLAS. That flexibility is important because some people want the speed and comfort of the old workhorses, but others want to avoid integration problems (missing libraries, etc).

Mirror of forum.nim-lang.org

726 :: a vote for numeric/scientific programming in Nim