nimforum mirror - Best way to convert sequence of bytes to uint32

jlindsay (orginal) [2016-11-18T15:30:59+01:00] view original

Just as the title suggests, what is the recommended way of converting a sequence of four uint8's into a uint32? Or even more specifically, how would you convert four consecutive bytes within a larger sequence into a uint32? At the moment, I'm doing this:

var offset = 0
var x = (cast[uint32](buf[offset]) shl 0) or
        (cast[uint32](buf[offset+1]) shl 8) or
        (cast[uint32](buf[offset+2]) shl 16) or
        (cast[uint32](buf[offset+3]) shl 24)

but this doesn't seem particularly efficient since it involves four individual casts and bit ops. Any suggestions? Thanks.

Stefan_Salewski (orginal) [2016-11-18T17:05:47+01:00] view original

Does your code work?

When it does, maybe it is not that bad, gcc optimizer is smart.

Have you tested this:

var
  i: uint32
  s: seq[byte]
  p: int
i = cast[uint32](s[p..p+3])

(I have the feeling that the +3 does not really matter for a cast.)

But no, I think that casts the address, not the content :-( Have to think about.

jlindsay (orginal) [2016-11-18T17:40:18+01:00] view original

Thanks for your reply Stefan. Yes, my code works fine. It's just that I have to run it many millions of times and so I was concerned that it may not be the most efficient means of accomplishing the conversion. If there is a standard one-operation method that I'm not aware of, then I thought it may significantly impact my performance. I have also tried your suggested method but it does not produce the desired result. It seems to produce a different number every time, which makes me think that it is providing the address of s rather than the value. I'm not sure though. I'm still quite new to Nim.

Stefan_Salewski (orginal) [2016-11-18T17:48:15+01:00] view original

Yes, after some thinking I already edit my post, that cast gives the address.

I think I have already seen a similar working cast somewhere, but can not find it currently.

My next idea was

i = (cast[ptr uint32](s[p..p+3]))[]

but that seems to be wrong also. So we have to wait for someone smarter to answer...

jlindsay (orginal) [2016-11-18T17:58:31+01:00] view original

Ha! Looks like we're both experimenting with the same solutions unsuccessfully. I'm sure someone out there knows what we're doing wrong :-). Anyhow, thank you very much for giving it a go.

Stefan_Salewski (orginal) [2016-11-18T18:01:24+01:00] view original

var
  i: uint32
  s: seq[byte]
  p: int

s = newSeq[byte]()
s.add(0)
s.add(1)
s.add(0)
s.add(0)
i = (cast[ptr uint32](addr s[p]))[]

echo i

output is 256, maybe not that wrong. But I am still confused, maybe Dom can add a few more pages to his book.

jlindsay (orginal) [2016-11-18T18:13:04+01:00] view original

Yep, that's it. You did it! It seems really strange to me that that empty [] needs to be placed after the cast to dereference it. Shouldn't s be dereferenced by the [p] within the cast? Anyhow, I'm very thankful to have a solution and am grateful for all your time spent finding it.

Arrrrrrrrr (orginal) [2016-11-19T09:49:45+01:00] view original

The dereferencing is necessary because is interpreting the address of s[p] as a pointer of uint32. Also, if you don't actually need a sequence, just use plain arrays.

doAssert 257u == cast[uint32]([1u8, 1, 0, 0])

lucian (orginal) [2016-11-19T12:25:10+01:00] view original

I think this question has popped before. I have used http://nim-lang.org/docs/endians.html. Just get the generic pointers to source and target and pass them to the standard lib, that will do. Something along these lines:

import endians

proc loadHigh*[T: SomeUnsignedInt](buff: openArray[byte], value: var T, offset: int = 0): bool =
  
  var i: pointer = addr buff[offset]
  var o: pointer = addr value
  
  result = true
  
  case sizeof(value)
        of 1:
            value = ord(buff.data[0])
        of 2:
            bigEndian16(o, i)
        of 4:
            bigEndian32(o, i)
        of 8:
            bigEndian64(o, i)
        else:
            result = false

Use littleEndian<NN>() if you need. Similarly, you could to a "store" function. You may also want to test the offset does not overflow your source.

jlindsay (orginal) [2016-11-19T19:09:19+01:00] view original

Arrrrrrrrr, I believe that I do need the seq since I am reading the bytes in from a file. lucian, your solution looks quite interesting. I'll have to explore this option further. Thanks.

Krux02 (orginal) [2016-11-20T01:27:43+01:00] view original

You need to know the endianness of the data you get, and only when the endianness of your system, and the endianness of the encoded bytes do not match, you need to swap the bytes of your ints. Or you just assume everything is LittleEndian, skip all endianness checks and cast the pointer, because endianness is annoying. As long as you don't plan to release on ARM, (android ios raspberry) you should be fine. But even ARM has as far as I know options to deal with littleEndian.

Araq (orginal) [2016-11-20T12:04:35+01:00] view original

The very first solution is the cleanest IMO. Just check the produced assembler code if you're worried about performance. I bet you will pleasantly surprised.

jlindsay (orginal) [2016-11-20T14:22:57+01:00] view original

@Krux02, yes, the data that I will be reading will vary in endianness, which adds an extra complication for this. In the case where the file's endianness does not match the machine's, I intended to perform byte swapping. @Araq, I'm a little surprised by this. I had done a quick performance comparison of the three proposed approaches and found this:

import times, endians, strutils

when isMainModule:
  let
    numIter = 100_000_000
    offset = 1
    startTime1 = epochTime()
  
  var
    z: seq[byte] = @[0'u8, 1, 0, 0, 0, 0, 0, 0, 0]
    sum = 0'u64
  
  # Approach 1, casting followed by bit shifting
  for i in 0 .. < numIter:
    sum += (cast[uint64](z[offset]) shl 0) or
      (cast[uint64](z[offset+1]) shl 8) or
      (cast[uint64](z[offset+2]) shl 16) or
      (cast[uint64](z[offset+3]) shl 24) or
      (cast[uint64](z[offset+4]) shl 32) or
      (cast[uint64](z[offset+5]) shl 40) or
      (cast[uint64](z[offset+6]) shl 48) or
      (cast[uint64](z[offset+7]) shl 56)
  
  let endTime1 = epochTime()
  echo "Solution 1 required ", endTime1 - startTime1, " seconds to count to ", sum, "."
  
  # Approach 2, direct casting
  sum = 0
  let startTime2 = epochTime()
  
  let p = 1
  for i in 0 .. < numIter:
    sum += (cast[ptr uint64](addr z[p]))[]
  
  let endTime2 = epochTime()
  echo "Solution 2 required ", endTime2 - startTime2, " seconds to count to ", sum, "."
  
  # Approach 3, use of endians
  let startTime3 = epochTime()
  var v = 0'u64
  sum = 0
  for i in 0 .. < numIter:
    littleEndian64(addr v, addr z[offset])
    sum += v
  
  let endTime3 = epochTime()
  echo "Solution 3 required ", endTime3 - startTime3, " seconds to count to ", sum, "."

And on my machine the results are:

Solution 1 required 4.861401081085205 seconds to count to 100000000.
Solution 2 required 0.9038379192352295 seconds to count to 100000000.
Solution 3 required 2.14686393737793 seconds to count to 100000000.

The first solution was by far the slowest of the approaches with the direct casting approach clearly the performance winner. I realize that you would know far better than I would in this situation, so is there perhaps something that I'm doing wrong above?

Stefan_Salewski (orginal) [2016-11-20T15:12:47+01:00] view original

What gcc option? Without -d:release gcc may do no optimization, so you may be right, but gcc with -O1, -O2, -O3 or -Os may even recognize that there is not happening much in the loop and so totally remove the loop.

jlindsay (orginal) [2016-11-20T16:15:28+01:00] view original

Stefan, you're absolutely right to mention this. In fact, my original testing was simply running the program from Atom using the Script add-on...so who knows how it is configured to run in this case. So I ran it from the command line using


nim c --run -d:release --verbosity:0 -x:off --cc:clang test.nim

and found:

Solution 1 required 0.2506728172302246 seconds to count to 100000000.
Solution 2 required 0.07390594482421875 seconds to count to 100000000.
Solution 3 required 0.2373640537261963 seconds to count to 100000000.

It clearly changed the results but the second solution is still the fastest by far. I don't think the compiler is removing the loop, but I may be wrong.

ManfredLotz (orginal) [2016-11-20T16:39:49+01:00] view original

Taking your code I have


Solution 1 required 0.03380799293518066 seconds to count to 100000000.
Solution 2 required 0.03265786170959473 seconds to count to 100000000.
Solution 3 required 0.09943199157714844 seconds to count to 100000000.

Fedora 25, 64bit, nim 0.15.2, and just


nim c -d:release

Indeed --cc:clang makes it worse


Solution 1 required 0.2279288768768311 seconds to count to 100000000.
Solution 2 required 0.06590104103088379 seconds to count to 100000000.
Solution 3 required 0.09804606437683105 seconds to count to 100000000.

Araq (orginal) [2016-11-20T17:04:18+01:00] view original

Try an ordinary type conversion instead of the 'cast' then. Oh and move all the code into a 'main' proc.

jlindsay (orginal) [2016-11-20T19:25:50+01:00] view original

@Araq, wow, moving the code to a 'main' proc rather than 'when isMainModule' made the compiler optimize away the for loop entirely. I wasn't expecting that to happen :-) .

Mirror of forum.nim-lang.org

2626 :: Best way to convert sequence of bytes to uint32