nimforum mirror - How to write (mostly) "architecture independent" (32/64 bits) code?

monster (orginal) [2017-10-08T13:13:54+02:00] view original

I'm currently working under the assumption that the first application I'm going to write in Nim is going to be a client-server setup, where the client is 32-bit mobile, and the server is 64-bit Linux.

My "problem" is that I want both sides to behave in exactly the same way. Mathematical computations should produce exactly the same result. And if the client is going to crash and burn with an integer overflow, then I want the server to also crash and burn with the same data at the same location. I don't want to have to run separate tests for the client and the server; I should expect the same results everywhere.

Since one cannot rely on getting the same exact floating point results on two totally different processors, with different OS and standard math library function implementations, I'm going to try and not use any floating point data at all. In fact, I'm going to check later if I can use fixed-point in Nim somehow.

The size of references/pointers is also going to be different. There is clearly nothing I can do there. Hopefully, I'll settle for some serialization library that takes care of this issue.

What "bothers" me is, that the default integer type, int, is architecture-dependent. I has to be, so that one can talk to the OS and standard libraries. But this also means that if I simply use int in my code, the size of data-structures, their alignment, and the points at which an integer overflows might happen, will differ between the client and the server. And that, is exactly the opposite of what I want to achieve.

I could try to always explicitly use int32 and 42'i32 (or int64 and 42'i64) in my code, but beyond the fact that it would make the code more verbose, the bigger problem is that I might forget to do that somewhere, and I won't even know until I get an inexplicable "de-sync" between the client and the server.

So, is there a way I could force a specific size for "int" per module, or some way I could get the compiler to produce an error, if it sees "raw int" (or float) used anywhere in my code (with, obviously, some way to disable that check for code that talks directly with other libraries or the OS)?

LeuGim (orginal) [2017-10-08T14:35:51+02:00] view original

You can make int whatever you want per a module, just by declaring it (type int = ...). But that won't change types of literals. So then you should avoid implicit typing. Yet stdlib of coarse uses int (you can declare your int in system.int ot make small wrappers for stdlib modules, which just declare your types and then include the module (include strutils), but implicit types may be used there too). E.g.:

type int = int8

var x: int = 8
echo x.sizeof # -> 1
var y = 8
echo y.sizeof # -> 4, 8, ..., `system.int`'s size, platform-dependent

Libman (orginal) [2017-10-08T15:42:54+02:00] view original

I never had this problem, but I've heard of NASA software projects that were commanded to use an arbitrary precision library for absolutely positively everything numeric. I think they had an auditing tool that listed all types used anywhere in the program, and alerted about the "bad" native types. That was the basis of many jokes about that 4,294,967,296th engine not breaking the code.. :P

monster (orginal) [2017-10-08T16:49:17+02:00] view original

@Libman Whether it's important to you totally depends on what you are working on. I want to work on a (soft) real-time client/server (or more precisely client/distributed-server) simulation (aka "game"). And I only want to send the "commands" over the network, rather than every change that happens on the server (partly, because I want to use voxels, rather then the usual "static game map", and sending the voxels changes over the wire would be very expensive). So, given the same starting data, and a theoretically unlimited list of "commands", the state of the client must be the same as the state of the server after executing all those commands. I'll just call that reproducibility, although I think there's a better word that I can't remember right now. I haven't tried it yet, but I read a lot about it, and it's a lot harder then one would think.

When it comes to floating point, I must say that I normally program in Java (at work), and even "Write-Once-Run-Anywhere" Java tells you that floating-point operations cannot be expected to give the same results on different computers (which is why they have StrictMath too). So I'm really not going to assume that it "simply works" in C, which doesn't even try to give you such guaranties.

I don't need "arbitrary precision", luckily. I can live with "imprecise" results; I just need to get the same (imprecise) result all the time, everywhere. Atm, I don't think I'll need to do much non-integer math (with one major exception), so fixed-point will do nicely. And I think Nim can probably make the fixed-point usage trivial with a few templates/macros.

Apart the fact that I'd like to use a "physics engine" too (I'll never finish, if I write my own), I'd say that I'm confident I can get everything else to run "reproducibly" everywhere.

Arrrrrrrrr (orginal) [2017-10-08T17:04:04+02:00] view original

You have to define an interface between client and server anyway, that's the only place you should care. Regardless on whether your client/server is 32/64, i'm sure you are not going to use every bit of an int. Say, you want to transmit some ID field. Are you expecting to have more than a million IDs? If not, you'd simply define your ID message as an int32 and then convert with server.id = int(msg.id)/client.id = int(msg.id).

monster (orginal) [2017-10-08T19:19:18+02:00] view original

@Arrrrrrrrr

Wow, there's a lot of assumptions here... :D

You have to define an interface between client and server anyway, that's the only place you should care.

Beyond the simple transfer of data, I also want all the computations to give the same result (for reasons I explained earlier). I can totally get an integer overflow multiplying two quantities in 32bit, which I won't get in 64bit. If I don't check against the overflow (which I probably should), I will get different results. And if I do check for the overflow, the client code will fail, while the server code will not (and nor will the tests, because I want to run them on a server-class machine, so that it runs faster). Both cases will result in a "de-sync" of the client, forcing a disconnect, a complete reset of the client and a re-connect. So, atm, I can't agree with your statement. Possibly, I'm missing something which is obvious to you?

Are you expecting to have more than a million IDs?

What if I gave every bullet it's own ID? In a DOOM-style "death-match" that lasts an hour, that might not be an issue, but what if I want to create a vast, persistent game world, which is designed to spread over 1000s of (randomly generated) square kilometers, and run for many years without a reset (which, by the way, is exactly what I plan)? Would you still think I'm not going to reach a million IDs? I won't have a million active IDs at the same time, but I will, in total, over time. And it's way simpler to just use uint64 for IDs, than to implement some complicated ID-reusing system.

i'm sure you are not going to use every bit of an int.

How can you assume that without knowing precisely what I want to do? If it was a 64bit int, maybe not, but that's exactly the problem; on a 32bit client, it won't be 64bit, but rather 32bit.

Let's say I have some kind of whole-number currency in my game (credits, gold-pieces, diamonds ...) and I was stupid enough to use an "int" to store it. Once a player on a 32-bit client reached the 2 billions limit, their account would flip into the negative, while on the server, where int can go up to 2^63, everything would be fine. Have you ever experienced this situation? I did, in two different games so far, which is why I'm aware that int32 is a bad choice for a game currency. I guess you could say that an account balance will be part of the interface, and so it's "size" must be defined to something specific.

But there are also "transient", computed values, like the "total weight of all equipment", ... which are needed for performance, but do not need to be transferred through the interface, as they are derived from other values. If I only check the interface types, I might accidentally use an int for a transient value, since it's not part of the interface.

I might never have 2 billions players, or 2 billions messages, or 2 billions game entities, but I most certainly could have more than 2 billions of some "quantity".

I guess one special case where the size of int wouldn't mater would be when every single value had it's own data-type (distinct int32, for example), and all those data-types had hard-coded, programmer-defined, limits. If that limit is less than 2^31, then the behavior would be the same everywhere, and if the limit was over 2^31, I would get a compiler error while defining that constant and would be forced to use int64. Coding like that, OTOH, sounds like a PITA. But maybe that is what professional game devs do. As a "corporate Java programmer", I've never looked at a professional "native" game code-base, so I wouldn't know.

cdunn2001 (orginal) [2017-10-08T19:42:54+02:00] view original

perl can easily search your code for \bint\b, where backslash-b means "word boundary". That's easier than the special compiler option you seem to be requesting.

I would also use nim-msgpack for data-interchange between client and server, for extra type-checking and debugging.

But note that Nim is not the only language in the world, and safety is not its forte. Consider D, where "int" and "float" are always 32 bits:

https://dlang.org/spec/type.html

Arrrrrrrrr (orginal) [2017-10-08T20:15:11+02:00] view original

I won't have a million active IDs at the same time, but I will, in total, over time.

high(int32) returns 2_147_483_647, i'm sure that's more than enough. But, in any case, i can't think of many properties where you will need high values. In that case, i'm sure it won't be a problem to define by hand prop: uint64. But these will be the minority, so i would not bother into looking for complex solutions when we are talking about edge cases.

mashingan (orginal) [2017-10-08T20:42:39+02:00] view original

@Monster, somehow I can understand your standpoint about values in game.

I would consider to use libgmp but I guess you can implement limited bignum for your use?

jlp765 (orginal) [2017-10-09T00:05:15+02:00] view original

You know your requirements, so design to that. If you need more than high(int64), then use bignum or ....

The int type is different on different systems, so for your case, don't define anything as int. Use specific int sized types like int32 or int64 (or bignum, or ...) so that both ends are consistent. That will have speed implications (especially if you need bignum)

The point is valid that you must have some protocol to talk client/server, and it will define the type/size of the binary data you send/receive. Each end either converts or uses that data as is.

As I understand it, float is consistent across 32/64 bit platforms (maybe someone else can clarify that further).

And it's way simpler to just use uint64 for IDs, than to implement some complicated ID-reusing system.

Nim defaults to using int rather than uint (change of mindset from C), so you should really be using int64 rather than uint64. You can of course use uint64, but if so, there are specific uint operators for this (not doing this will add even more potential problems).

Arrrrrrrrr (orginal) [2017-10-09T13:08:08+02:00] view original

Couldn't render post #20386.

monster (orginal) [2017-10-09T13:44:34+02:00] view original

Thanks all for your comments. Just a few clarification:

I want to use UE4, and AFAIK, it expects C++ source code. So D/GO/... won't cut it.

I don't think I need "unlimited precision", but I also forgot about bignum; it's already available, and should produce the same results anywhere, which is my main requirement. Maybe I really should start with that, until proven that it causes performance problems. OTOH, I do prefer fixed-size (allocation free) data-types whenever possible.

int vs uint, or int64 vs uint64, is not really a concern for me. I usually code in Java, where we don't even have a uint/uint64. On the one hand, if a value is unsigned, and the valid lower bound is 0, then you never have to check the lower bound. OTOH, I remember from reading "The C++ programming language" recently, that one should use signed values even if negative values are not valid, because it's likely bad data is going to be negative, so with an int, you have a better chance of catching bugs. And catching bugs has higher priority then removing the "lower bound check".

Whenever I define an object field, I have to decide how big it will be; I see this as a critical design decision. So it's no big deal using int32 or int64 instead of int. Specifying the type of all the literal constants is rather more tiresome. But that is just an "annoyance", not really a "problem". The "problem" is that I don't trust myself to always remember to be explicit. I don't even have someone to do code-review, so some compiler/tool support would have been welcome. I guess I can just start by checking my code with a regex, as suggested, and eventually write a macro instead, if I the regex cannot cover all corner-cases.

mashingan (orginal) [2017-10-09T20:23:44+02:00] view original

If it's the case, I think it's better for you to make a type (maybe distinct type?) that when to work with int literal, it will convert it to uint64 (or int32/64).

When it's different type, compiler will tell you, for example, the proc +, won't work unless you define it first. So your concern of being "careless" will be checked by compiler.

monster (orginal) [2017-10-14T15:34:50+02:00] view original

I just found out, there is already one fixed-point Nim API: fixmath 16:16 fixed-point might not be the right scale for me (I would prefer 24:8), but it's a start, and can serve as example if I want to write my own.

Krux02 (orginal) [2017-10-14T17:02:48+02:00] view original

I remember reading from the factorio blog, that they had desync problems, because the math library in C++ can behave slightly different on different compilers for C++. Nim inherits this problem. I am sorry I can't quote it though.

Udiknedormin (orginal) [2017-11-05T02:48:41+01:00] view original

@Krux02

Isn't it true only for floating-point calculations?

Mirror of forum.nim-lang.org

3233 :: How to write (mostly) "architecture independent" (32/64 bits) code?