nimforum mirror - How to split a string, but keep the separator?

drifter (orginal) [2014-10-17T03:58:48+02:00] view original

Hello,

I am trying to replicate something I am doing in Perl, which is to split a string according to a regular expression, but to keep the separator as part of the results.

For example:


import parseutils
import strutils
import re
import pcre

var currentline: string = "Some <bold<text> followed by <italic<text>."
var tokens = @["hello"]

tokens = split(currentline, re"<.*?<")   #Split the current line using <string< as the delimiter

for token in tokens:
  echo token

In this example, the output is:


Some
text> followed by
text>.

The output I am trying to achieve is:


Some
<bold<
text> followed by
<italic<
text>.

In Perl I would just wrap the delimiter in ( ) to keep is as part of the split results. Is there a way to do something similar in Nim?

yves

gradha (orginal) [2014-10-17T12:31:50+02:00] view original

You can try one of the variants of re.findBounds() to extract the substrings you need with the returned positions. The =~ template example also shows how to use capture groups and may help. If you look at this template's source you will see it just uses match() to extract the captures.

drifter (orginal) [2014-10-18T03:15:35+02:00] view original

mmm, I'm not sure I understood correctly how I would use this to achieve the desired result.

However I noticed a tokenize proc.

http://nim-lang.org/strutils.html#tokenize.i,string,set[char]

I was wondering how can I specify my own seps for this proc?

The default is set to whitespace, but how do I change it to a sep that I want to use (or actually more than 1). Suppose I want to tokenize using <,>, {,},[,] as separators. How I can I set seps: to these?

EDIT!

Haha I figured it out!


for word in tokenize("[chapter] Some text <bold<text> followed by <italic<text>.", {'<','>','[',']'}):
  echo word.token

Which gives:


[
chapter
]
 Some text
<
bold
<
text
>
 followed by
<
italic
<
text
>
.

Excellent...now I can start experimenting :)

Mirror of forum.nim-lang.org

590 :: How to split a string, but keep the separator?