nimforum mirror - Why splitWhitespace() from strutils lacks maxsplit parameter?

olwi (orginal) [2017-10-12T22:17:22+02:00] view original

Couldn't render post #20431.

mashingan (orginal) [2017-10-13T09:28:51+02:00] view original

Use split proc

proc split(s: string; seps: set[char] = Whitespace; maxsplit: int = - 1): seq[string] {..}

to use

from strutils import split

for token in "My string when splitted".split(maxsplit = 1):
  echo token

olwi (orginal) [2017-10-13T10:05:55+02:00] view original

split on Whitespace and splitWhitespace are not equivalent:

from strutils import split, splitWhitespace
let s = "  a couple of \t words "
echo s.split.len            # prints 9
echo s.splitWhitespace.len  # prints 4

In case of leading whitespace split(maxsplit = 1)[0] is empty string, while I expect splitWhitespace(maxsplit = 1) to be the first non-whitespace token in the string. Sure, one can strip the leading whitespace before using split(), etc.

But as I see, all the functionality for splitWhitespace(maxsplit = <something>) is already there, it is just not exposed via public splitWhitespace interface.

The question is "why?" Is it buggy or what?

olwi (orginal) [2017-11-08T13:11:54+01:00] view original

The optional maxsplit parameter was added to strutils.splitWhitespace.

For details see: https://github.com/nim-lang/Nim/issues/6503

Udiknedormin (orginal) [2017-11-09T00:13:15+01:00] view original

@olwi

I'd say it looks like a bug. I always use split though.

olwi (orginal) [2017-11-09T00:21:09+01:00] view original

@Udiknedormin

What exactly looks like a bug?

Udiknedormin (orginal) [2017-11-09T09:05:52+01:00] view original

How split can't behave like splitWhitespace. I guess it should and just be more general.

Well, there is a similar library in Fortran. If I recall, there is a function a little similar to split (it's also an iterator). It separates the concept of a separator characters and unmatched characters. So it would be something like that:

echo "  a couple of \t words ".split(sep = Whitespace)
# @[, , a, couple, of, , , words, ]
echo "  a couple of \t words ".split(ignore = Whitespace)
# @[a, couple, of, words]

olwi (orginal) [2017-11-09T21:43:43+01:00] view original

Well, I think it is possible to merge splitWhitespace into split like this:

s.split() splits on whitespace (the way splitWhitespace does)

2) all other forms of split work like they do now. To get the current default behaviour of split one should use s.split(Whitespace) In other words:

echo "  a couple of \t words ".split()
# @[a, couple, of, words]
echo "  a couple of \t words ".split(Whitespace)
# @[, , a, couple, of, , , words, ]

This is easy to implement, but as far as I understand that would constitute a breaking change...

Udiknedormin (orginal) [2017-11-10T08:43:16+01:00] view original

Mine version would not. :) The old code can't use parameters non-existing then so you'll just have to add another split argument which works like splitWhitespace does today and then make splitWhitespace an alias for some split call (with depreciation annotation) for backwards compatibility.

DeletedUser (orginal) [2017-12-05T23:40:20+01:00] view original

Maybe add a noEmpties: bool = false argument to split that returns the version without the empty strings (splitWhitespace version)

Mirror of forum.nim-lang.org

3241 :: Why splitWhitespace() from strutils lacks maxsplit parameter?