nimforum mirror - Understanding memory safety compared to Golang

davidpelaez (orginal) [2016-01-21T18:43:38+01:00] view original

Hello.

I want to start by apologizing for the potential simplicity of my question for seasoned developers in C, C++ or other systems programming languages where you actually think about memory management (as opposed to dynamic ones like Ruby).

I have read several threads on HN about Rust's guarantees of memory safety and how on practical coding this doesn't seem to be a problem in Nim even though it has weaker guarantees on this area. From my understanding the GC in Nim and memory allocation per threads would be similar to Golang, am I correct in this?

Specially, how would you compare the "risks" of memory management compared to Go? I understand that there's a way to control the garbage collector and do some unsafe operations, but I'm really not interested in these deeper parts of the language and I'm actually understanding how using Nim would compare to something like Golang. That's of course if those parts of the language are indeed isolated and I cannot "accidentally" use them without any special syntax. Am I correct in that these parts are "special" similar to what go's Unsafe package offers? Check the link of unsafe in Go here: https://golang.org/pkg/unsafe/

There are several perspectives on what makes Nim interesting, for me syntax, a richer type system (even though still simple for newcomers) and macros make it a huge win compared to something like Rust or Go. Therefore I would like to understand what I would "lose" or "win" if trying to replace Go with Nim. It seems to me this could be mostly related to the "safety" guarantees but maybe I have it all wrong. Could you help me get this in more detail?

I'm not trying to put a very subjective discussion in place in things like "the syntax is cleaner", I would like to understand some more objective things like "In nim you could leak information of a random variable of your program if you put read an index out of bounds in an array". That kind of thing is very important for me because I'm actually an HTTP API developer more than anything, not a game developer willing to tackle the fine grained complexity of memory management.

I have been researching for months what alternatives I have to replace Ruby and I deeply dislike Go. In my case security concerns and understanding how I could shoot myself on the foot are the most important points and from what I see these can be more objective to understand benefits and downsides when performing comparissons between languages.

If it's too vague, please let me know and I can clarify my intent. However consider that I'm mostly looking in an area that I don't know very well so many different answers could be helpful!

Thanks!

Jehan (orginal) [2016-01-21T22:16:03+01:00] view original

No, Nim doesn't have stronger or weaker safety guarantees than Rust [1]. Rust's memory safety is nothing new, either. It is mostly that some older languages like C/C++ are the exceptions in not being memory-safe. There is nothing new or magical about memory safety. LISP was already memory-safe when it was invented in 1958. The only question is how much performance you need to trade away for it (the value is never zero for non-trivial programs, but can vary greatly, depending on whether the language was designed with it in mind or not).

The main difference between Rust and other languages is that it does some more (but not all [2]) safety checks at compile time rather than at runtime. It also allows you to avoid GC, but does not provide you any memory safety over GC. Rust's borrow checker allows you to statically prove that references are live [3]; a GC simply avoids deallocating any memory that has a live reference to it (on the other hand, a GC can ensure that references remain live even where this is hard or impossible to prove statically). The end result is the same with respect to memory safety (the reason some people want to avoid GC is for performance reasons, not memory safety).

Note that the runtime checks do have a potential overhead (the compiler will be able to remove some, but not all). But this is unavoidable for memory-safe code (being able to do so statically would imply solving the halting problem or a language that is too restrictive for practical use). Unavoidable runtime checks typically can occur when accessing elements in an array (especially a dynamically sized array) or fields of a polymorphic type.

There is no unsafe keyword in Nim, but all unsafe features in Nim (such as ptr or addr) have their own keywords already.

As for Nim vs. Go: Nim is more expressive, but Go is simpler. Either goal can be preferable, depending on your requirements.

[1] There is a rather technical concern in that generating C/C++ code can create some potential issues with undefined behavior; this is about the backend, not the language. These can also be avoided if necessary [4, 5].

[2] You will not be able, for example, dispense with runtime boundary checks for arrays entirely; compilers can prove that they are not needed only in some cases, not all, since an index can be an arbitrary computable expression.

[3] Which, incidentally, is a damn impressive accomplishment.

[4] To make Nim memory-safe, compile with -d:safe or -d:release -d:safe and the following lines in your config:


@if safe:
  gcc.options.always = "-w -fpermissive -fno-strict-overflow -fsanitize=null,shift -fsanitize-undefined-trap-on-error"
  gcc.cpp.options.always = "-w -fpermissive -fno-strict-overflow -fsanitize=null,shift -fsanitize-undefined-trap-on-error"
  clang.options.always = "-w -fpermissive -fno-strict-overflow -fsanitize=null,shift -fsanitize-undefined-trap-on-error"
  clang.cpp.options.always = "-w -fpermissive -fno-strict-overflow -fsanitize=null,shift -fsanitize-undefined-trap-on-error"
  obj_checks:on
  field_checks:on
  range_checks:on
  bound_checks:on
@end

[5] AFAIK, there is also at least one remaining bug in the allocator where you could technically violate memory safety if the object size calculations overflow under certain circumstances. However, that is a bug, not intended behavior.

davidpelaez (orginal) [2016-01-21T22:58:39+01:00] view original

I appreciate the patience to writing your answer. I was certain I was understading some parts of it wrong and you explained clearly about memory safety and how that's entirely different to performance related to the Gc.

Rust makes memory assertions at compile time rather than at runtime with the GC. In either case I'll have an error for instance reading form an array, but I won't have the risk that C has of leaking data from other parts of the program, is that correct?

I agree in that neither simplicity nor expresiveness are bad per se, I'm very interested in Nim for its expresiveness. But apparently I don't have "extra" concerns besides what I'd have in Go regaring the security of my code at a low level. Except for the bug you mention of course.

I see the suggested config, what would I risk if I don't include those configs? Leaking data from memory would then be an option?

Thanks again for taking the time to reply.

zielmicha (orginal) [2016-01-21T23:25:43+01:00] view original

If you compile with '-d:release' your program is essentially on its own - there are no overflow checks, array range checks etc. Without this flag (and without '-d:safe') there is mostly theoretical risk that compiler will generate incorrect code for nil (null) dereference or similar (compilers are allowed by C standard to do so, but in practice no compiler does).

That said, there are some "language bugs" that make unsafe code possible, but I wouldn't be bothered (even Python has them and golang too (?)).

davidpelaez (orginal) [2016-01-22T02:37:47+01:00] view original

What would be the implications of the program "being on its own"? unexpected crashes if I make a mistake or simply it would risk leaking data? I understand there's a performance penalty, so I'd like to know what trimming that penalty off would entitle.

It's good to know they're bugs and not expected behavior. Specially that they're known which is the most important part I think. Will those be fixed before 1.0? Do this bugs mean that we should avoid Nim in production and keep an exploratory perspective in the meantime?

Varriount (orginal) [2016-01-22T02:55:18+01:00] view original

In general, -d:release turns off all runtime checks (I can't remember if it turns all compile time range checks off though).

This means the following are turned off:

Object conversion checks

Case variant field checks

Range and bounds checks

Overflow/Underflow checks

Floating point exception checks

Now, you might be thinking, "Oh god -d:release turns my program into a ticking time-bomb", but the reality is a quite less hyperbolic.

Use of a garbage collector prevents most memory leaks, and use of 'for' loops prevents a large number of off-by-one errors when iterating. Qualifying types with not nil enforces static checks against null-reference errors, and a flexible type system means that type safety doesn't have to be sacrificed all that often.

Furthermore, turning checks on/off doesn't have to be on a program-wide basis. Option pragmas allow for enabling/disabling these checks on a per-procedure basis. This means you can compile your program with checks on, profile code, and disable checks in the areas that don't need them.

Jehan (orginal) [2016-01-22T03:17:42+01:00] view original

davidpelaez: I see the suggested config, what would I risk if I don't include those configs?

First, the risk is not really quantifiable as part of a language alone (we'll get to that in a moment). The options I listed are basically the minimum to guarantee memory safety.

Second, memory safety is neither a necessary nor sufficient condition for not being exploitable. It just so happens that violating memory safety isone of the major sources of exploitable software defects. But there are plenty of other sources, too (SQL injections, broken program logic, and more). No program is 100% bug-free; not compilers, not the libraries you use [1]. Memory safety is one way to reduce your attack surface (and it's as close to a free lunch as you can get), but there's so much more that can go wrong. And Ted Unangst demonstrated how you can leak secrets even with memory safety by reusing memory-safe buffers (while he used Rust as an example, the approach can be used with pretty much any memory-safe language).

Again, you generally want memory safety because it's as close to a free lunch as you can get in this area, but it's neither a hard guarantee (too many parts of your software stack may still be unsafe somewhere) nor does it prevent all security-related issues.

[1] Even when you formally prove their correctness, your proof may still have bugs. They're going to be much rarer, but are not impossible.

jibal (orginal) [2016-01-22T06:09:39+01:00] view original

In either case I'll have an error for instance reading form an array, but I won't have the risk that C has of leaking data from other parts of the program, is that correct?

You refer repeatedly to "leaking data". From the context, it seems that you mean memory corruption. Using the right term makes it easier to find information about it, e.g., https://en.wikipedia.org/wiki/Memory_corruption.

It's still not easy to make out what you're saying. Reading from arrays isn't an error ... perhaps you mean buffer overflow, which results from writing to an array using an invalid index, and thereby writing outside the array. Note that an index might have an arbitrarily large or small value, so any memory could be affected, which could affect "other parts of the program". Even a store to just one address above or below an array could affect an unrelated variable, and thus other parts of the program. The way to guard against this (aside from careful coding and using iterators rather than indices when possible) is to turn on bounds checking. None of this has anything to do with GC or not, and really doesn't have anything to do with the language, except that C/C++ have no bounds checking and it's difficult to add it to a compiler because of the way arrays and pointer arithmetic is defined. Just about every other language provides a mechanism for guaranteeing array integrity. Note that the difference between "systems programming languages" like Nim or Rust vs. languages like Java or Ruby is that the former are performance-focused and so they allow bounds checking to be turned off, and often that's even the default. But at runtime, Ruby has to do the very same array bounds checks as Nim with bounds checking does. Another difference is that the idioms for manipulating arrays in "higher level" languages such as Ruby tend to work with iterators rather than raw indices, but this is changing ... "modern" programming languages support a functional paradigm. Even C++ has moved in this direction.

All that was about arrays and indexing. The other primary source of memory corruption is invalid pointers. This is where the "safe"/"unsafe" constructs come in. Unlike with array indices, a runtime check for pointer validity is not feasible, so safety is obtained via a memory-safe semantic model and, where possible, compile-time checks for adherence to that model. For instance, heap memory that is explicitly freed could be used after the free (including a second free), resulting in corruption, so we see implicit freeing mechanisms like garbage collection, reference counting, and "smart pointers". And using stack-allocated memory after it goes out of scope can result in corruption, so we see things like RAII and Rust's lifetime tracking. A common cause of memory corruption is the use of uninitialized memory, especially uninitialized pointers, so languages like Rust have restrictions aimed at preventing their occurrence. So-called "dynamic" languages like Perl or Ruby achieve this by pre-initializing all variables and arrays with an "undefined" value at the beginning of their lifetime and doing runtime checks for it, but that's considered too expensive for systems programming languages. Aside from uninitialized memory, there can be pointers that contain incorrectly calculated addresses. This is possible at a whim in C or C++, but in a language such as Nim it requires the use of such constructs as addr and ptr that are explicit "unsafe" markers. Avoid those and uninitialized memory and your program should be pretty safe ... but there is no guarantee. Even a program that is formally safe might call a library function that is implemented using unsafe features and has a bug ... or a compiler bug could generate incorrect code with resulting arbitrary behavior. (Interpreted languages are far safer in this regard, but there's a high performance cost.)

cdunn2001 (orginal) [2016-01-22T10:15:20+01:00] view original

Note that the problems prevented by Rust's lifetimes can usually be detected easily at runtime by a memory checker. NASA needs to prevent all bugs. The rest of us only need to fix them quickly.

Anyway, Nim is great for an efficient server, and because of its javascript code generation, it can be interesting for web-development. But I think you should consider Haxe.

However, if you really are drawn to Nim's practical type-system, then maybe you will be able to build some useful web-related libraries for the open-source Nim community.

Araq (orginal) [2016-01-22T10:32:54+01:00] view original

Interpreted languages are far safer in this regard, but there's a high performance cost.

Interpreted languages are rather quick at introducing eval though and reflection is everywhere in C# and Java. In the previous decades nobody gave a shit about safety, especially not programming language designers. But who can blame them, nobody wants to use Ada. Nim is essentially Ada + a GC with a friendlier syntax but nobody noticed and instead I get blamed for every single undefined behaviour in the C spec.

Jehan (orginal) [2016-01-22T11:49:39+01:00] view original

I have to dig up my notes on range_checks. I thought the same, but I remember that there was a corner case where leaving it out caused issues.

About field checks: yes, the compiler could do that, but for most practical use cases, it's just not a viable option. In any event, right now the compiler uses a union, so it's necessary.

However, field checks do not normally cause any extra overhead, since you need to dispatch based on the object variant, anyway, and the code generator eliminates the duplicate comparison. It protects you in case you access a field without a check based on assumptions that may not be correct. So, there is normally not much of a reason to leave this option out.

cdunn2001 (orginal) [2016-01-22T16:45:24+01:00] view original

@Araq said:

Nim is essentially Ada + a GC with a friendlier syntax

Wow! That's true. I learned Ada a long time ago and completely forgot about it. That's a great way to describe the particular form of safety offered by Nim.

davidpelaez (orginal) [2016-01-22T17:27:09+01:00] view original

@jibal I meant exactly memory corruption which is a larger concept indeed. I didn't know it, thanks for pointing that out. I did mean buffer overflow in my example and sinsce C/C++ have no bound checks memory corruption is possible in a way that I haven't seen in dynamic languages (don't know even if it would be possible besides a bug in the interpreter). My concern is the risk of introducing unwanted behaviour, for instance exposing data by reading an index out of bounds. That's what I meant by "leaking data". But this is greatly reduced keeping the checks when compiling Nim as @Jehan suggests.

@Jehan I understand that there are many other sources of security compromises. I work on software for financial services where it would make sense to write some microservices in very performant languages because they provide very common operations to a cluster, e.g. timestamping hashes for audit purposes. Given the case I know how to handle security in the message processing and transport but I don't have much experience with compiled languages specially powerful ones likes Nim that let you go very low level, hence my focus on that.

@cdunn2001 thanks for sharing Haxe, I'll take a look. This opens yet again the questions of where each language is better fitted. I've seen this topic in many places and I wish there was a clearer view on where to use Nim or not to. Is there any specific reason why you wouldn't use it to replace a microservice with high concurrency written in Golang?

@Araq "Nim is essentially Ada + a GC with a friendlier syntax but nobody noticed and instead I get blamed for every single undefined behaviour in the C spec." Seems like it, from my limited research on the topic many things are related to the fact that Nim compiles to C. But with the language becoming more stable at some point in the future a different "backend" could make sense and this will disappear. The most important thing is that you have created something with a very unique appeal among the current languages in the game ;)

Generally speaking I have learned a lot from this thread. Thanks! As a conclusion I'd say that given the proper compilation flags and not using ptr or addr would result in similar benefits to what golang could offer regarding memory safety. I don't see much of a reason why Nim isn't a good alternative to Golang for microservices where I want more expresiveness, but please let me know if you see clear downsides as to why this wouldn't be a good idea.

cdunn2001 (orginal) [2016-01-23T17:20:02+01:00] view original

davidpelaez asked:

Is there any specific reason why you wouldn't use it to replace a microservice with high concurrency written in Golang?

There are 3 strong reasons to use Go for enterprise (i.e. corporate) development:

Large ecosystem

Goroutines/gochannels

Restricts developer creativity

I think Nim is catching up quickly. And really, if an ecosystem is what you need, consider C#, which is an awesome language, if you have money.

Someone at my company suggested that debugging a transpiled language can be difficult. Actually, I consider that a feature of Nim. I can often discover how something actually works by looking at the (highly readable) generated C code.

As stefantalpalaru said in another thread (http://forum.nim-lang.org/t/1278):

Go's channels have more features than Nim's ones (http://nim-lang.org/docs/channels.html) and the select on multiple channels is particularly useful. Even in the gccgo implementation they have seen significant battle testing and refinement.

However, most novice Go devs don't really know how to use Goroutines wisely. See my blogpost, if you have tried the "Web Crawler" exercise in the Go tutorial. When Nim 1.0 has added multi-core support to async await, I think the Nim equivalents will be fine, and maybe more transparent than Go.

By the way, if concurrency is the main issue, I would consider Erlang, and if I liked that I would go with Elixir.

This is the big one. I get calls from recruiters all the time looking for Go devs. It's funny because Go is such an easy language to learn, and its restrictiveness means that even crappy coders will produce fairly readable code.

Personally, I believe that Nim makes excellent trade-offs between safety and friendliness. But it has more pitfalls than Go. And Nim's powerful macros will always allow your self-professed hotshot coders to write code that nobody else understands.

More important, many good coders have concluded that Go is too restrictive (among other problems) and would never accept employment to code in Go. I've moved on. In my mind, Nim dominates Go.

But you are mainly interested in security. I can tell you that when I worked for one of the Big IT companies, Go was excluded because its ssl library was not trusted. I thought that was silly, since its simplicity makes bugs less likely, and Go's ssl was written by experts. Besides, openssl was written by top security experts, and they're still finding nasty bugs. Complexity is the enemy of security.

Nim's strong type-safety, transparency, and minimalism would make it my first choice for security work, especially given the ease of wrapping C code in Nim. (The standard Nim ssl library is not sufficient for serious security work, but it should suffice for http calls.) For reference:

http://nim-lang.org/docs/ssl.html

http://nim-lang.org/docs/openssl.html

https://github.com/jovial/nim-openssl-evp

brianrogoff (orginal) [2016-01-23T18:08:29+01:00] view original

Araq: Nim is essentially Ada + a GC with a friendlier syntax

I wanted to use Ada for a while back in the day, and I like a lot of the language, but there are definitely some areas (macros, definition of new operators) where Nim rejects some key Ada philosophical positions. In any case, I like almost all of the Nim 'enhancements' over Ada. Well written Nim is a pleasure to READ, and that was a key part of the Ada philosophy, that readability is paramount.

One of the few things I miss from Ada is the ability to have locally scoped imports. D and OCaml have this too. Also, lately I've come to think that Nim's OOP is a bit too much and that we'd be better served by a simpler system, perhaps one more like Ada 95/05.

To the OP: I'd pick Nim over Go unless there's a strong external reason (job, collaborator's opinion, ...) to pick Go. You already indicated your distaste for Go. Nim still has bugs in its implementation, and you may run into these more than you would similar ones in Go. Go is simpler, and has more people working on it. Rust is interesting to you too; I like it quite a bit and it has a promising future, but I still find it rather heavy to program in compared to Nim, not simply because of the borrow checker. I'm looking forward to Nim 1.0 (this year :-/) and a period of stability and cleanup.

Araq (orginal) [2016-01-23T19:10:35+01:00] view original

I wanted to use Ada for a while back in the day, and I like a lot of the language, but there are definitely some areas (macros, definition of new operators) where Nim rejects some key Ada philosophical positions.

True but these philosophical differences are all irrelevant when it comes to memory safety.

Chris660 (orginal) [2018-06-29T21:49:15+02:00] view original

Nim is essentially Ada + a GC with a friendlier syntax but nobody noticed

I noticed! It's what prompted me to start learning Nim actually :-)

juancarlospaco (orginal) [2018-06-30T18:24:33+02:00] view original

I try Go on the past, compared to Nim Binaries are much more smaller, Hello World on Go weights 2.5Mb on my system, Go Compiler Binary itself weights 350Mb on my system, size sometimes increases with new releases, I can not make my Nim binary (release) even reach 1Mb, also Ive read that is not safe (or at least not best practice) to use strip and upx on Go Binaries.

Its really hard to use its FFI with C or Python (no bridge with Python3.7, CGo is slower).

mashingan (orginal) [2018-07-02T02:03:16+02:00] view original

I'm forced to use Go now :( and so far it's really painful experience. (Considering the other team members are pretty much newbie in Go too, so they aren't really using Go's most prided feature, Goroutine, not that I can say much about it to them though).

Having used to Nim and then have to write with Go, it's just like have to live a backward civilization (this is my opinion, not necessarily true for others).

Mirror of forum.nim-lang.org

1961 :: Understanding memory safety compared to Golang