nimforum mirror - About language support for non-nullable pointers

marcianx (orginal) [2014-07-20T19:35:40+02:00] view original

Araq, I have so far played along with D and Rust and only recently looked into Nimrod. I love the meta-programming capabilities!

I know it's really late in the language design to bring this up now (apologies if it has already been discussed before; the searches I tried didn't bring this up).

I have a concern that there is no built-in way in the type system to say that I have a non-null pointer that one doesn't have to check against null (for performance reasons). This check is often easily overlooked because of mismatching assumptions. The default in the standard library seems to be to use dereferenceable nullable pointers which require either run-time checks or (for efficiency) risks of seg faults, which has been frustrating in my experience with the languages (C, C++) Nimrod is trying to replace (see also https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions ).

As in C++, one can implement such a type in Nimrod as in the code at the bottom. But ideally, non-nullable pointer types should be built into the language so that the compiler helps the programmer enforce non-nullability with static analysis where it is expected. And the standard library should use this type when expected. Eg it is noted here (http://nimrod-lang.org/tut1.html#strings) that most string functions cannot deal with nil. That should be a part of their type signature so that no runtime check would need to be done for null to raise an exception.

import strutils
var test : string
# Should be prevented by compiler by use of non-nullable string type.
echo test.capitalize() # Segfault!

A static type system can easily support such a type. As you are most likely aware, Rust deals with this by having Maybe pointer types (internally represented as nullable pointers so there is no performance disadvantage) that the compiler understands well and does static analysis on. And the default santized pointers Box<T> (previously ~T) and Gc<T> (previously @T) in Rust cannot be null and thus are safe to use. For interfacing with C and for low-level control, Rust also supports a nullable type *T, but its use is avoided in "safe" code which makes it easier to reason about things in safe contexts.

I wanted to know your thoughts on this. (And also whether you consider it far too late to make such a significant switch in the language.)

# Nullable and non-nullable references.
# (can similarly be done for ptr)

##############################
# Non-nullable pointer
# (can be dereferenced)
type NonNilRef*[A] = object
  p: ref A

# (ideally, this wouldn't make
# a copy of p's contents)
proc `[]`*[A](p: NonNilRef[A]): A =
  result = p.p[]


##############################
# Nullable pointer
# (cannot be dereferenced)
type OptionRef*[A] = object
  p: ref A

proc isNone*[A](p: OptionRef[A]): bool =
  result = p.p == nil


# Constructors
proc fromRef*[A](p: ref A): OptionRef[A] =
  result = OptionRef[A](p: p)

proc some*[A](p: ref A): OptionRef[A] =
  if p == nil:
    raise newException(EInvalidValue,
      "Give me SOMEthing, not NILthing!")
  result = OptionRef[A](p: p)

proc none*[A](): OptionRef[A] =
  result = OptionRef[A](p: nil)


##############################
# Conversions
proc unwrap*[A](p: OptionRef[A]): NonNilRef[A] =
  if p.isNone():
    raise newException(EInvalidValue,
      "Nothing to unwrap!")
  result = NonNilRef[A](p: p.p)

proc wrap*[A](p: NonNilRef[A]) : OptionRef[A] =
  NonNilRef[A](p: p.p)


##############################
# Test

#var a : ref int
#echo a[] # Segfault!

var m = some(new(int))
#echo m[] # Compiler error :)
var p = m.unwrap()
echo p[]

var n = none[int]()
#echo n[] # Compiler error :)
# Runtime exception
# (ideally could even be compile-time with static analysis)
var q = n.unwrap()

dom96 (orginal) [2014-07-20T20:54:05+02:00] view original

Hello marcianx!

There in fact is already a "not nil annotation" (see here). From what I recall it's still not fully implemented. But it's a start and the plan for the future is to improve it and get the standard library to use it.

marcianx (orginal) [2014-07-20T22:51:01+02:00] view original

Great! Sorry I missed that!

marcianx (orginal) [2014-07-21T00:39:28+02:00] view original

Actually, I would like to add that "not nil" doesn't handle the issue of nullable pointers being dereferenceable, which is a common cause of problems. Ideally, pointers wouldn't be allowed to be dereferenced (at least in "safe") code until they were known to be non-null, much like the implementation I proposed above.

Perhaps it might make sense to make ptr and ref non-nullable by default and define nullable types via

var x : enum ref Obj, nil

which the compiler would special-case to implementing underneath with a nullable pointer.

asterite (orginal) [2014-07-21T19:04:40+02:00] view original

Also consider the way it's done in Crystal (a way which I haven't seen in any other language). There you have a Reference type from which you can inherit and you have a different Nil type. So if you have something of type String, that's a String, never nil. If you have Foo, that's Foo, never nil.

Now, you can have something whose type is String or Nil. Then when you invoke a method on it, the method must be present in both types. This further extends to having values whose type can be one of any combination of types: Something or Nil is just a particular (very common case). Since it's common, we internally represent this "union" with a simple pointer: the null pointer is Nil, the non-null pointer is for the other type.

I don't know if Nimrod has union types, but a similar rule could be implemented: can only invoke a proc if it's defined for all the types in the union. That would involve a dynamic dispatch, but it's simple (it's just an if, which is what you would do anyway).

Araq (orginal) [2014-07-21T20:57:41+02:00] view original

I don't know if Nimrod has union types, but a similar rule could be implemented: can only invoke a proc if it's defined for all the types in the union. That would involve a dynamic dispatch, but it's simple (it's just an if, which is what you would do anyway).

Er ... no, it's not "simple" even if it ends up generating only an if. Classical error in reasoning.

Araq (orginal) [2014-07-21T21:27:57+02:00] view original

Actually, I would like to add that "not nil" doesn't handle the issue of nullable pointers being dereferenceable, which is a common cause of problems. Ideally, pointers wouldn't be allowed to be dereferenced (at least in "safe") code until they were known to be non-null, much like the implementation I proposed above.

Well in your ideal world I am forced to write if foo.isNil: internalError("foo is nil") everywhere in the compiler, and even that is an optimistic syntax relying on a quite elaborate control flow analysis which no common functional programming language (and neither Rust) provides. Maybe this means the types in the compiler have been poorly designed to begin with, but I still can't see a better way to design them.

A nilable pointer is NOT a language design mistake, especially not for "systems programming" but this requires a rather lengthy explanation. Maybe I'll blog about it. But please note that:

you can't simply port ML's type system to Nimrod and call it a day

the problem has been amplified in Java (reference types everywhere; foo.method() fails when foo is nil) and is not nearly as bad in other languages IME.

marcianx (orginal) [2014-07-22T05:42:57+02:00] view original

Araq: A nilable pointer is NOT a language design mistake, especially not for "systems programming" but this requires a rather lengthy explanation. Maybe I'll blog about it.

Araq, I happily anticipate your blog post. I would like to elaborate on my concerns to make sure you have a chance to address them. Perhaps answering them here might help bring our perspectives closer together.

I want to emphasize that my concern is not with a nilable pointer itself -- the option type I defined up top is a nilable pointer. My concern is with allowing the dereference of a nilable pointer. Please help me understand why one would ever want to do that.

If the pointer can be nil, then you want to null-check it before dereferencing unless you consider segfaulting a feature (should it ever be?).

If the pointer can't be nil, then you're using the wrong type.

As you are most likely aware, it is much worse in C/C++ where dereferencing a pointer is undefined behavior so that compilers use the presence of dereferencing to imply that the pointer is not nil (and sometimes incorrectly so due to programmer error). Why have a compiler imply such a thing when the type system can easily enforce it?

Araq: Well in your ideal world I am forced to write if foo.isNil: internalError("foo is nil") everywhere in the compiler, and even that is an optimistic syntax relying on a quite elaborate control flow analysis which no common functional programming language (and neither Rust) provides.

Not at all! Given my initial implementation up top, it would just be foo.unwrap(). The programmer in this case is indicating explicitly that he/she expects a non-null type and unwrap enforces it with a mandatory assertion. Also, after foo.unwrap() is inlined, the compiler can easily remove the redundant check if it is already guarded with a null-check (if !foo.isNone(), with isNone() also inlined).

Your statement about code being riddled with if (or unwrap()) is under the assumption that you would be working with nilable types most of the time. Except possibly when interfacing with external libraries, I expect the use of nilable pointer type to exist only when you actually intend to check them (eg. the nodes of a binary tree).

Jehan (orginal) [2014-07-22T19:48:39+02:00] view original

The question of how to deal with null references is non-trivial. Eric Lippert lists some of the considerations that went into design of nullable types for C# here.

There is a major difference here between functional and imperative programming language and that is that in imperative programming languages partial initialization and partial transformation of state is commonplace (and an important reason to choose an imperative language in the first place). For example, consider the implementation of a generic type Stack[T] (or Queue[T]) using an array-based implementation. In order for that to be efficient, you'll allocate a short array initially, then grow it by a constant factor as it needs to expand. But: you need to fill the empty reserved slots with something. In the case of references, nil can be that something. But in order for that to work for all types, nil must work for all reference types. Informally, you can say that nil implements the axiom of choice for the type system.

You can work around it to some extent, but at the cost of complicating your implementation (or significantly complicating the language and compiler). E.g., you'd start out with an empty stack and as soon as you add the first item, you use that element as the filler. That's sort of doable for stacks, but becomes rather complicated with queues, where you have to be careful not to leak memory and keep performance intact (i.e., as soon as the filler element is removed, you have to choose a new filler element).

Note that I'm not saying that you cannot/should not have non-nullable references as the default; just that the tradeoffs and implications are very much non-trivial.

Araq (orginal) [2014-07-23T10:01:42+02:00] view original

In addition to what Jehan said, let me address your concerns directly:

If the pointer can be nil, then you want to null-check it before dereferencing unless you consider segfaulting a feature (should it ever be?).

Yes, segfaults are a (very poorly implemented) feature. Note that the spec doesn't say that a nil exception can be caught, so it's a poor implementation of a fatal error. And there are plans to implement it properly by inserting runtime checks in debug mode.

If the pointer can't be nil, then you're using the wrong type.

No, I'm using a nilable pointer that is not nilable in this context due to a complex invariant that I can't express in the type system.

Your statement about code being riddled with if (or unwrap()) is under the assumption that you would be working with nilable types most of the time.

Well yes, that's what I'm doing. I work with "can't possibly be nil here, but the type system is too weak to express the complex invariants that lead to this conclusion". I suppose most people do not work on a complex piece of code such as a compiler and are not served well by this feature but then Jehan's points apply. ;-)

marcianx (orginal) [2014-07-24T01:50:20+02:00] view original

Thank you both for your comments. Sorry, my previous post was a little more extreme, essentially ignoring complex constraints. And indeed, both D and Rust have a notion of "unsafe" code which allows one to work with a more complex invariants beyond the more restrictive constraints imposed by the type system, eg. when implementing a data type. However, they localize this as much as possible (in "unsafe" blocks) but allow one's data types to expose these constraints within the system. Something similar can be done here via:

proc unsafe_unwrap*[A](p: OptionRef[A]): NonNilRef[A] =
  assert(!p.isNone())
  result = NonNilRef[A](p: p.p)

The Stack/Queue type implementation would use OptionRef, but would return only NonNilRef. Again, I don't see a particular reason to allow dereferencing a nilable pointer until the programmer explicitly asserts its non-nilability by use of the above method and thenceforth uses only the non-nilable type.

Of course, none of this comes free -- the programmer writing the abstract data type (Stack[T]/Queue[T]) has to work with two types underneath and a conversion operator between them (unsafe_wrap() and wrap()). (For non-pointer types, these conversion operators would be identity().) But the users of the data type would be the benefactors.

(This approach might also be generalized to Stack/Queue[T] where T is some expensive-to-initialize struct and doesn't want to run the constructor for the unused entries. So one can do unwrap/wrap conversions between the "unsafe" raw bytes in the array and an initialized object. But I digress...)

In cases where one really, really needs to routinely deference the option type, one could locally define a separate dereference operator (could import something like import unsafe.deref which defines [] for OptionRef or define another method p.deref()), but most code should avoid this so that the compiler can help prevent nil-dereferences.

Araq (orginal) [2014-07-24T08:52:21+02:00] view original

Hey we do have a notion of unsafe code too, it's just that cast and addr are already keywords and so there is no point in having to wrap that in an unsafe block. They already scream at you. ;-)

BTW we also have an effect system and a taint mode and regionized pointers and integer type subranges which Rust and D and Haskell lack. Lifetime annotations and non nullability are not everything -- compile-time safety is a wide topic.

makoConstruct (orginal) [2015-11-14T08:32:34+01:00] view original

I second marcianx's vote of anticipation here, I'd really like to see a blog post explaining why reference types being nullable by default should not be considered a design flaw, because from an outside perspective it really does look like... Well to be straight with you I'm just pretty sure that it is and it will take a pretty thorough essay to bring me around.

Jehan (orginal) [2015-11-14T15:37:16+01:00] view original

@makoConstruct: I wrote about it at length in this thread.

Araq (orginal) [2015-11-14T20:18:47+01:00] view original

Well now that NimCon is almost over there have in fact good ideas been brought up of how a solution might look like that works well with Nim, so I'm quite excited and we indeed might see "not nil" become the default in the language, but take this with a big grain of salt. Ideas which look good on paper might turn out to not work at all. Either way, there is hope for Nim. ;-)

Mirror of forum.nim-lang.org

504 :: About language support for non-nullable pointers