Just as the title suggests, what is the recommended way of converting a sequence of four uint8's into a uint32? Or even more specifically, how would you convert four consecutive bytes within a larger sequence into a uint32? At the moment, I'm doing this:
var offset = 0
var x = (cast[uint32](buf[offset]) shl 0) or
(cast[uint32](buf[offset+1]) shl 8) or
(cast[uint32](buf[offset+2]) shl 16) or
(cast[uint32](buf[offset+3]) shl 24)
but this doesn't seem particularly efficient since it involves four individual casts and bit ops. Any suggestions? Thanks.
Does your code work?
When it does, maybe it is not that bad, gcc optimizer is smart.
Have you tested this:
var
i: uint32
s: seq[byte]
p: int
i = cast[uint32](s[p..p+3])
(I have the feeling that the +3 does not really matter for a cast.)
But no, I think that casts the address, not the content :-( Have to think about.
Yes, after some thinking I already edit my post, that cast gives the address.
I think I have already seen a similar working cast somewhere, but can not find it currently.
My next idea was
i = (cast[ptr uint32](s[p..p+3]))[]
but that seems to be wrong also. So we have to wait for someone smarter to answer...
var
i: uint32
s: seq[byte]
p: int
s = newSeq[byte]()
s.add(0)
s.add(1)
s.add(0)
s.add(0)
i = (cast[ptr uint32](addr s[p]))[]
echo i
output is 256, maybe not that wrong. But I am still confused, maybe Dom can add a few more pages to his book.
The dereferencing is necessary because is interpreting the address of s[p] as a pointer of uint32. Also, if you don't actually need a sequence, just use plain arrays.
doAssert 257u == cast[uint32]([1u8, 1, 0, 0])
I think this question has popped before. I have used http://nim-lang.org/docs/endians.html. Just get the generic pointers to source and target and pass them to the standard lib, that will do. Something along these lines:
import endians
proc loadHigh*[T: SomeUnsignedInt](buff: openArray[byte], value: var T, offset: int = 0): bool =
var i: pointer = addr buff[offset]
var o: pointer = addr value
result = true
case sizeof(value)
of 1:
value = ord(buff.data[0])
of 2:
bigEndian16(o, i)
of 4:
bigEndian32(o, i)
of 8:
bigEndian64(o, i)
else:
result = false
Use littleEndian<NN>() if you need. Similarly, you could to a "store" function. You may also want to test the offset does not overflow your source.
@Krux02, yes, the data that I will be reading will vary in endianness, which adds an extra complication for this. In the case where the file's endianness does not match the machine's, I intended to perform byte swapping. @Araq, I'm a little surprised by this. I had done a quick performance comparison of the three proposed approaches and found this:
import times, endians, strutils
when isMainModule:
let
numIter = 100_000_000
offset = 1
startTime1 = epochTime()
var
z: seq[byte] = @[0'u8, 1, 0, 0, 0, 0, 0, 0, 0]
sum = 0'u64
# Approach 1, casting followed by bit shifting
for i in 0 .. < numIter:
sum += (cast[uint64](z[offset]) shl 0) or
(cast[uint64](z[offset+1]) shl 8) or
(cast[uint64](z[offset+2]) shl 16) or
(cast[uint64](z[offset+3]) shl 24) or
(cast[uint64](z[offset+4]) shl 32) or
(cast[uint64](z[offset+5]) shl 40) or
(cast[uint64](z[offset+6]) shl 48) or
(cast[uint64](z[offset+7]) shl 56)
let endTime1 = epochTime()
echo "Solution 1 required ", endTime1 - startTime1, " seconds to count to ", sum, "."
# Approach 2, direct casting
sum = 0
let startTime2 = epochTime()
let p = 1
for i in 0 .. < numIter:
sum += (cast[ptr uint64](addr z[p]))[]
let endTime2 = epochTime()
echo "Solution 2 required ", endTime2 - startTime2, " seconds to count to ", sum, "."
# Approach 3, use of endians
let startTime3 = epochTime()
var v = 0'u64
sum = 0
for i in 0 .. < numIter:
littleEndian64(addr v, addr z[offset])
sum += v
let endTime3 = epochTime()
echo "Solution 3 required ", endTime3 - startTime3, " seconds to count to ", sum, "."
And on my machine the results are:
Solution 1 required 4.861401081085205 seconds to count to 100000000.
Solution 2 required 0.9038379192352295 seconds to count to 100000000.
Solution 3 required 2.14686393737793 seconds to count to 100000000.
The first solution was by far the slowest of the approaches with the direct casting approach clearly the performance winner. I realize that you would know far better than I would in this situation, so is there perhaps something that I'm doing wrong above?
Stefan, you're absolutely right to mention this. In fact, my original testing was simply running the program from Atom using the Script add-on...so who knows how it is configured to run in this case. So I ran it from the command line using
nim c --run -d:release --verbosity:0 -x:off --cc:clang test.nim
and found:
Solution 1 required 0.2506728172302246 seconds to count to 100000000.
Solution 2 required 0.07390594482421875 seconds to count to 100000000.
Solution 3 required 0.2373640537261963 seconds to count to 100000000.
It clearly changed the results but the second solution is still the fastest by far. I don't think the compiler is removing the loop, but I may be wrong.
Taking your code I have
Solution 1 required 0.03380799293518066 seconds to count to 100000000.
Solution 2 required 0.03265786170959473 seconds to count to 100000000.
Solution 3 required 0.09943199157714844 seconds to count to 100000000.
Fedora 25, 64bit, nim 0.15.2, and just
nim c -d:release
Indeed --cc:clang makes it worse
Solution 1 required 0.2279288768768311 seconds to count to 100000000.
Solution 2 required 0.06590104103088379 seconds to count to 100000000.
Solution 3 required 0.09804606437683105 seconds to count to 100000000.