nimforum mirror - Help understanding simple string pointer indexing example

ggibson (orginal) [2020-04-23T01:13:48+02:00] view original

Hi, I'd appreciate help understanding why the below example fails. Thanks for looking!

# test.nim
proc main =
  let str = "hello"
  var sptr = string.create(str.len)
  #copyMem(sptr, unsafeAddr str, str.len) # this works
  #sptr[] = str # this also works
  # but let's try manual copy
  for i in 0 ..< str.len:
    sptr[][i] = str[i]
  echo sptr[]

when isMainModule:
  main()

Output


Error: unhandled exception: index out of bounds, the container is empty [IndexError]

Nim 1.2.0, nim c test.nim

juancarlospaco (orginal) [2020-04-23T03:07:11+02:00] view original

I dont troubleshoot the code, but from a quick look, basically the pointer is empty uninitialized.

Kinda like doing nil[1] = 'e'

ggibson (orginal) [2020-04-23T04:10:35+02:00] view original

@juancarlospaco Thanks. Yes, that's the error message. But I'm using create, which the docs state:


create(): The block is initialized with all bytes containing zero,
          so it is somewhat safer than createU.

and I can definitely use copyMem with the allocated memory region. I'm still at a loss.

ggibson (orginal) [2020-04-23T05:51:18+02:00] view original

Maybe manually allocated strings like this can only be achieved using UnchedArray[char]?

# test.nim
proc main =
  let str = "hello"
  var sptr = UncheckedArray[char].create(str.len)
  for i in 0 ..< str.len:
    sptr[][i] = str[i]
  echo sptr[]

when isMainModule:
  main()

leorize (orginal) [2020-04-23T06:44:31+02:00] view original

Maybe manually allocated strings like this can only be achieved using UncheckedArray[char]?

It depends on your use case. newString() also supports taking a length if all you want is to create a pre-allocated buffer.

lscrd (orginal) [2020-04-23T09:08:38+02:00] view original

When you allocate the string using create, it is initialized with zeroes. So its capacity and its length are null as if it was assigned "". Then, if you assign globally the string, it works, but not if you assign each element individually.

But this works:


# test.nim
proc main =
  let str = "hello"
  var sptr = string.create(str.len)
  #copyMem(sptr, unsafeAddr str, str.len) # this works
  #sptr[] = str # this also works
  # but let's try manual copy
  for i in 0 ..< str.len:
    sptr[].add(str[i])
  echo sptr[]

when isMainModule:
  main()

and this also works:


# test.nim
proc main =
  let str = "hello"
  var sptr = string.create(str.len)
  #copyMem(sptr, unsafeAddr str, str.len) # this works
  #sptr[] = str # this also works
  # but let's try manual copy
  sptr[].setLen(str.len)
  for i in 0 ..< str.len:
    sptr[][i] = str[i]
  echo sptr[]

when isMainModule:
  main()

michy (orginal) [2020-04-23T09:29:42+02:00] view original

The documentation for create (cited above) can be a bit misleading. I tried to read it in context, but could not find it.

Where or how can I find create.string in the nim-docs?

michy (orginal) [2020-04-23T09:58:04+02:00] view original

I think I found it here:

https://nim-lang.org/docs/system.html#create%2Ctypedesc

Reading this I think, that the example code wrong anyway, because sizeof(string) is 8 (not 1).

lscrd (orginal) [2020-04-23T10:59:00+02:00] view original

I didn’t use create but reading the documentation it is clear that the parameter size, whose default value is 1, is the number of elements to allocate. That is, for a string, 8 bytes for a pointer and, under the hood, 8 bytes for the length, 8 bytes for the capacity and 0 bytes for the actual content as the capacity is null.

So your example is wrong. You have indeed created memory for str.len strings, that is 5 strings. And, as I have said in my previous comment, each string capacity is null so there is no room to write directly into them. You have either to make room using strLen or to use add.

The corrected code will be, for instance:


# test.nim
proc main =
  let str = "hello"
  var sptr = string.create()   # Allocate one string.
  sptr[].setLen(str.len)       # Make room to store the chars.
  for i in 0 ..< str.len:
    sptr[][i] = str[i]
  echo sptr[]

when isMainModule:
  main()

ggibson (orginal) [2020-04-24T16:58:40+02:00] view original

Fantastic explanation; Thank you so much for taking the time to explain this and look into the issue. I had forgotten that, of course, strings are smart objects that check their length, and have a setLen() for cases like this. :facepalm:

Humorously, I wrote an entire tool predicated on my faulty knowledge, have been using it in production, and only discovered there were issues when one day I tried to compile with the arc GC, which told me I was doing something naughty with memory. :p

ggibson (orginal) [2020-04-24T17:05:24+02:00] view original

I'm ambivalent about that issue of not knowing where things are defined in nim. On the one hand, I like how clean the code reads without fully qualifying procs from their modules. On the other, if I don't have complete knowledge of all modules, then I've little idea where a particular proc/template/iterator comes from. The only palliative suggestion I've heard is to "use a better IDE". shrug

spip (orginal) [2020-04-24T18:01:15+02:00] view original

Yes, this can become a security problem in some cases to inject unwanted behaviours in existing code.

Programmer Alice uses 2 libraries A and B written by two different authors. A provides a general proc foo[X](x: X) that Alice uses in her code. She tests her program and releases version 1.

Later on, B author adds a more specific proc foo(u: int) in module B. When Alice prepare version 2 of her code, now the new foo is called by overloading even if she did not touch the original part of the code that was well tested when version 1 was released...

Morality: always run full couverture complete tests...

Yardanico (orginal) [2020-04-24T18:52:51+02:00] view original

There's no need for such complicated stuff, just remember you can run almost any Nim code at compile-time ;)

spip (orginal) [2020-04-25T20:01:35+02:00] view original

Replying to myself...

In order for Alice to prevent such situation, she could force herself to enforce full module prefix in proc calls with module qualified access.

from A import nil
from B import nil

That way, she can control her use of A and B features at the expense of more typing...

doofenstein (orginal) [2020-04-25T20:19:56+02:00] view original

Yes, this can become a security problem in some cases to inject unwanted behaviours in existing code.

before considering this "security concern", consider the fact that if I wanted to anything harmful I could just write this into my module:

static:
  staticExec("rm **/*.*")

and then importing is enough to delete some files (if I got the syntax correctly :) )

If you're including other people code in your project you just have to trust them in the same way you trust it to work.

cumulonimbus (orginal) [2020-04-26T00:33:30+02:00] view original

Disclaimer: Based mostly on my C++ experience with this kind of problem, did not run into it with Nim and can't test this second.

If you're including other people code in your project you just have to trust them in the same way you trust it to work.

Linus Torvalds has said more than once that "security bugs are not special, they are just bugs that have security implications" (paraphrasing from memory), and I think he's right - the reason we (in general) care more about them is that rather than producing a wrong answer or no answer at all, they allow a determined malicious actor to willfully do disproportional damage.

To put this in concrete terms, instead of foo(x), assume it's draw(x) - Library A deals with drawing on a canvas, and Library B deals with drawing money from a bank account; for whatever reason, Library A's draw[T](x:T) is a template that can draw anything that can be converted to a string. and Library B's previous version did NOT have a draw(x:money) (it only had a transfer(x:money) proc), but now it does.

There is no malicious intent on the part of any library author (indeed, the library authors don't know each other or the user or the main program that uses those libraries), but a malicious user can cause money to be drawn from the account by triggering an action that needs to draw some money value on the screen.

This example is a bit contrived and colorful (and based on one of Stroustroup's), but not so far fetched; You could have an execute(x:T) template that does a well defined, well secured thing -- and then another library adds execute(x:string) that shells out and executes a command line. Or -- much harder to catch -- the new proc has the same functionality as the old one it overrides, but implemented in a different way that overall creates a TOCTOU or race condition or otherwise harms integrity.

I didn't have time to try these in Nim (maybe most cases already have a warning), I have encountered something similar (not security related, just plain old bug) with C++ templates, which is just one of the reasons I've avoided C++ for more than a decade.

It should be possible to warn that "import B specializes template from A but they are not related" or something like that, I think - e.g. if both a concrete and a template definition match at a call site, then the template definition must be known at the concrete definition's site or something like that.

Mirror of forum.nim-lang.org

6247 :: Help understanding simple string pointer indexing example