nimforum mirror - Proposal of new functions in stdlib

zielmicha (orginal) [2015-11-02T20:39:33+01:00] view original

In my opinion, there are some important functions missing in stdlib. I will create PR for each if there is interest in them.

In system:

proc `&=`*[T](a: var seq[T], b: seq[T]) =
  for i in b:
    a.add(i)

In sequtils:

proc flatten*[T](a: seq[seq[T]]): seq[T] =
  result = @[]
  for subseq in a:
    result &= subseq

In strutils:

proc encodeHex*(s: string): string =
  const hexLetters = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f']
  result = ""
  result.setLen(s.len * 2)
  for i in 0..s.len-1:
    var a = ord(s[i]) shr 4
    var b = ord(s[i]) and ord(0x0f)
    result[i * 2] = hexLetters[a]
    result[i * 2 + 1] = hexLetters[b]

Araq (orginal) [2015-11-02T21:50:00+01:00] view original

encodeHex should use result = newString(s.len * 2) but bring it on! :-)

Arrrrrrrrr (orginal) [2015-11-02T22:00:01+01:00] view original

I miss some convenient procs for bools:

template `|=`*(a: var bool, b: bool) = a = a or b
template `&=`*(a: var bool, b: bool) = a = a and b

I dont like to write myLongBool = myLongBool or myOtherBool

repax (orginal) [2015-11-02T23:37:29+01:00] view original

Perhaps setLen could be used to preallocate space.

cblake (orginal) [2015-11-03T00:09:34+01:00] view original

Perhaps you could do:

proc `&=`*[T](a: var seq[T], b: openarray[T])

Araq (orginal) [2015-11-03T01:20:37+01:00] view original

The only reason why '&=' has a chance of getting into system.nim is that it already exists for strings and right now it's just inconsistent. Apart from that system.nim is already too bloated for my taste and of course everybody wants to have his favourite pet feature in system.nim.

What is not widely known however, is that you can have your own system.nim-like modules via --import (of course you can put that into your config too).

Jehan (orginal) [2015-11-03T01:28:33+01:00] view original

I suspect the following implementation of flatten is likely to be faster:

proc flatten*[T](a: seq[seq[T]]): seq[T] =
  var k = 0
  for subseq in a:
    k += len(subseq)
  result = newSeq[T](k)
  k = 0
  for subseq in a:
    for elem in subseq:
      result[k] = elem
      k += 1

filwit (orginal) [2015-11-03T01:32:00+01:00] view original

@cblake +1

Also, it would be nice to also work with single objects as well, so:

proc `&=`*[T](a: var seq[T], b: T) {.inline.} = a.add(b)
proc `&=`*[T](a: var seq[T], b: openarray[T]) = ...

But currently the & procedure doesn't work with openarray. Perhaps there's good reason for this, but if not it would be nice if it did for consistency:

var s = @[0]
s = s & 1
s = s & [2, 3] # this currently fails
s = s & @[4, 5]
s &= 6
s &= [7, 8]
s &= [9, 10]

Somewhat related, there are a couple of oddities with the seq procedures in system. First, why does len and xlen return different values?

var s = newSeq[int](10)
echo s.len  # 10
echo s.xlen # 9

Second, why doesn't the add procedures follow len's example and do nil-checks for you? It seems the ideal place to do checks since add potentially allocate memory anyways. If it checked and auto-allocated for you (with an xadd alternative, like xlen) I think it would avoid one of the biggest new-user gotchas in Nim.

var s: seq[int]
echo s.len # prints '0'
s.add(1) # runtime exception (this should work!)

echo s.xlen # unchecked
s.xadd(2) # unchecked

renoX (orginal) [2015-11-03T11:06:08+01:00] view original

@Arrrrrrrrr IMHO &= for boolean would be weird if &= is used for concatenation, both operations are quite different.

andrea (orginal) [2015-11-03T11:08:48+01:00] view original

By the way, flatten is not the only missing piece of functionality for sequence operations. Peter Mora had a consistent suggestion for things to add in sequtils, based on Clojure collections, but it seems that proposal is stuck now. Maybe it would be worth to rediscuss it

Arrrrrrrrr (orginal) [2015-11-03T11:37:07+01:00] view original

@renoX: i dont mind the name, just the functionality. I thought &= because you cannot define and=

Jehan (orginal) [2015-11-03T11:49:24+01:00] view original

filwit: Second, why doesn't the add procedures follow len's example and do nil-checks for you? It seems the ideal place to do checks since add potentially allocate memory anyways. If it checked and auto-allocated for you (with an xadd alternative, like xlen) I think it would avoid one of the biggest new-user gotchas in Nim.

Because this behavior can mask errors. (I'm personally not so crazy about the new behavior of len, either, but for other reasons.)

filwit (orginal) [2015-11-03T12:53:31+01:00] view original

Jehan: Because this behavior can mask errors.

Okay sure.. but then what's the justification for len doing it? Surely there are also errors masked by similar use of len with nil sequences.

Jehan (orginal) [2015-11-04T12:40:54+01:00] view original

@filwit: As I said, I'm not that crazy about len (or high) working for nil. The reason is that you can represent the empty string/sequence in a more efficient way. But at the same time, it now becomes an implementation detail whether a procedure/iterator works for an empty string or not, which leads to fragile code.

For example, the following code works with nil, but breaks when you replace the seq argument type with an openarray one:

proc sum[T](s: seq[T]): T =
  for i in 0..high(s):
    result += s[i]

var s: seq[int]
echo sum(s)

And there are plenty of other problematic cases. For example, substr() or slicing will break on nil, because they actually access the length field in the string rather than calling len(). Once you start down that road, the requirement to treat a nil as a string becomes contagious, infecting everything that it comes in touch with.

Hans (orginal) [2015-11-22T14:15:07+01:00] view original

Something fundamental which for some reason many languages since BASIC seem to leave out of their core:

const int_bits_minus_1 = 8 * sizeof(int) - 1
proc sgn*(a: int): int {.noSideEffect.} =
  return (-a shr int_bits_minus_1) - (a shr int_bits_minus_1)

This and homonymous functions for other number types would nicely complement abs(), which is already in system. A non-branching signum function such as this can make a significant difference for the performance of an inner loop. Once it's in system, I hope there is also a chance for it to become 'magic' some day, with even better assembler-based implementations.

Jehan (orginal) [2015-11-24T12:35:28+01:00] view original

@Hans: A simpler and (arguably) more portable implementation of sgn() would be the following:

proc sgn*(a: int): int {.noSideEffect, inline.} =
  int(a > 0) - int(a < 0)

That said, it's difficult to predict performance given the complexities of modern optimizing compilers: using clang, the following code actually seems to be (marginally) faster:

proc sgn*(a: int): int {.noSideEffect, inline.} =
  if a > 0: 1
  elif a < 0: -1
  else: 0

However, with gcc 5.1, it's significantly slower and it's also slower with clang if you change the order of the if branches.

Mirror of forum.nim-lang.org

1762 :: Proposal of new functions in stdlib