nimforum mirror - Academic summary

naasking (orginal) [2014-01-10T18:21:03+01:00] view original

I'm just reading up on Nimrod and trying to get a feel for the properties of the language. There doesn't seem to be a concise summary for those who are familiar with programming language theory. From what I can gather, Nimrod supports structural typing (on tuples at least), parametric polymorphism (generics), subtyping for object types, object types don't conform to a universal hierarchy (finally a non-braindead OO model), some kind of effect typing, macros (with templates as sugar), checked exceptions, overloading, iterators/generators, a fixed set of operators with fixed precedence.

Safety Questions:

What subset of Nimrod guarantees memory safety? eg. just avoiding pointers?

What subset of Nimrod guarantees type safety? eg. just avoid pointers and casting?

Type theory:

Do generic parameters support covariance and contravariance, or are they invariant?

Does the effect system support effect subtyping? eg. foo(){effect1,effect2} can call bar(){effect1} and baz(){effect2}, but not vice versa

I've seen a few examples of what looks like anonymous sum types in generic parameters, but I suspect it has more to do with the macro system, eg. from the manual: proc onlyIntOrStringT: int|string

Are exceptions subtyped, and if so, do exceptions require a universal supertype? eg. like System.Exception in .NET.

What's the theory behind overloading in Nimrod? The manual on overload resolution is empty.

Tuples are structurally typed, but are they subtyped as well? eg. tuple[name: string, age: int, ssn: string] structurally subtypes tuple[name: string, age: int]

It's not clear whether you support sums/variants that aren't objects ala algebraic data types. The object variants section in the manual cites an example where access to invalid fields raises runtime exceptions, but I'm asking about variants that require matching on the tag before accessing fields to avoid runtime exceptions.

I see "modules" in the Nimrod manual, but it's not clear whether Nimrod supports true separate compilation, ie. modules B and C can be separately compiled and linked at some unspecified later time. An example of non-modular compilation are languages that employ whole-program compilation.

The manual states that nesting openarrays isn't permitted. Does that mean that you can't represent something like a skip list?

Do arrays support subtyping? From the manual it sounds like the array length is part of the type, hence necessitating the openarray pseudotype for function parameters. So if we have a proc foo(array[0..5,int]), can we pass in an array[0..6,int]?

Low-level/Systems programming questions:

How much control does Nimrod provide over structures and allocation without violating safety? My go-to example for this is the immutable hash array mapped trie [1].

How much control over alignment and layout does Nimrod provide? eg. suppose you have to interface with some hardware which needs something like tuple[size: int, items: inline array[0..size, int]], or needs floats aligned on 16-byte boundaries or something.

[1] A HAMT is defined in pseudo-ML simply as:: type ('k,'v) hamt = Leaf of ('k * 'v) | Node of { bitmap : int, entries : ('k,'v) hamt array }

Each Node theoretically has N entries like a N-ary tree, but instead of allocating an N-sized array for every Node whose entries might be mostly empty, the array of a Node is <= N and fully populated and the bitmap encodes what entries of the virtual N-sized array are actually populated (some bit-twiddling translates virtual index to real index). Most memory safe languages need two allocations to represent a Node in an immutable HAMT (Node + array), but really only one ought to be needed (Node + inline array). So does Nimrod allow expressing this safely? If not, can Nimrod efficiently express this unsafely, eg. via pointers and manually specifying GC roots?

Araq (orginal) [2014-01-10T21:55:27+01:00] view original

That's quite a list of questions you have here. :-)

>What subset of Nimrod guarantees memory safety? eg. just avoiding pointers? >What subset of Nimrod guarantees type safety? eg. just avoid pointers and casting?

Since memory safety is highly correlated with type safety, allow me to conflate the two in my answer:

Avoiding cast, ptr, pointer and some conversions to cstring which the manual even talks about (yay). Currently also var T when used as a return type can break memory safety, but we know how to fix it. object case has its issues when it comes to reset which can now be fixed that the language has proper notion of construction. reset is however quite useful so I think it should stay but be clearly marked as unsafe. Since argument passing semantics allow for implicit pointer passing there are also quite some subtle issues with aliasing. I plan to extend the alias analyses to deal with these at compile time but in practice they almost never come up so it's a minor issue.

Do generic parameters support covariance and contravariance, or are they invariant?

Invariant but people frequently request co- and contravariance... Also the compiler still has a bug so that it accepts some explicit type conversions which are invalid.

Does the effect system support effect subtyping? eg. foo(){effect1,effect2} can call bar(){effect1} and baz(){effect2}, but not vice versa

Yes, effect subtyping is supported and the effects themselves are in a subtype relation FReadIO <: FIO.

I've seen a few examples of what looks like anonymous sum types in generic parameters, but I suspect it has more to do with the macro system, eg. from the manual: proc onlyIntOrString[T: int|string](x, y: T)

Type constraints for generics are rather adhoc. Generics in general are like C++'s templates, not like ML's generics. I'm not sure I like that too much but making generics macro-like in a language that embraces macros seemed like a good idea. ;-)

Are exceptions subtyped, and if so, do exceptions require a universal supertype? eg. like System.Exception in .NET.

Yes and it's called EBase.

What's the theory behind overloading in Nimrod? The manual on overload resolution is empty.

It is a "distance based" overloading resolution: (int,float) and (float,int) have the same distance to (int, int) and so produce an ambiguity. The distances between base types are adhoc. For generic parameters left-to-right unification is performed. I do not know if this even has an established name in computer science. I know there are better ways to specify this but unfortunately overloading resolution is performance critical. Which is no surprise really given Nimrod's design. It would be nice to build this on a stronger theoretical foundation but it works really well in practice and I'm not aware of any nasty surprises that are left.

Tuples are structurally typed, but are they subtyped as well? eg. tuple[name: string, age: int, ssn: string] structurally subtypes tuple[name: string, age: int]

No subtype relation here and for good reasons: Alias analysis doesn't like subtyping.

It's not clear whether you support sums/variants that aren't objects ala algebraic data types. The object variants section in the manual cites an example where access to invalid fields raises runtime exceptions, but I'm asking about variants that require matching on the tag before accessing fields to avoid runtime exceptions.

We don't support proper sum types. Nimrod's object cases are more flexible and the compiler can already issue a warning when it can't prove the field access to be valid. Nimrod has flow dependent typing for the not nil constraint and it's very useful for a future lock statement too.

I see "modules" in the Nimrod manual, but it's not clear whether Nimrod supports true separate compilation, ie. modules B and C can be separately compiled and linked at some unspecified later time.

Separate compilation has been implemented and used to work, see the internal documentation for the details. It is however currently broken and lots of work to support. The language is designed with separate compilation in mind.

The manual states that nesting openarrays isn't permitted. Does that mean that you can't represent something like a skip list?

A skip list would likely simply use a seq (Nimrod's growable array) instead.

Do arrays support subtyping? From the manual it sounds like the array length is part of the type, hence necessitating the openarray pseudotype for function parameters. So if we have a proc foo(array[0..5,int]), can we pass in an array[0..6,int]?

No subtyping for arrays and it wouldn't help anyway. openarray is the bridge so that both seq and array can be passed to a proc.

Araq (orginal) [2014-01-10T22:16:01+01:00] view original

Oh just noticed your list goes on...

How much control over alignment and layout does Nimrod provide? eg. suppose you have to interface with some hardware which needs something like tuple[size: int, items: inline array[0..size, int]], or needs floats aligned on 16-byte boundaries or something.

Not that much as we're a bit crippled here by compiling to C. I control alignment with dummy fields fwiw. In the long term I want to copy Ada's features.

How much control does Nimrod provide over structures and allocation without violating safety? My go-to example for this is the immutable hash array mapped trie [1].

Well Nimrod comes close with:

type
  TNodeKind = enum nkLeaf, nkInner
  TNode = object
    case kind: TNodeKind
    of nkLeaf:
      strVal: string
    of nkInner:
      children: seq[TNode]

Note how seq[TNode] and not seq[ref TNode] is supported in the object. For your example you can also use unsafeNew but this only works when 'k and 'v do not contain any GC'ed memory then as unsafeNew currently doesn't support a user defined marker proc. This will be added in the future though.

naasking (orginal) [2014-01-11T03:17:20+01:00] view original

Thanks for the answers, just a couple of followups:

Araq said: Type constraints for generics are rather adhoc. Generics in general are like C++'s templates, not like ML's generics.

Are constraints parametric? ie. is this valid: proc fooT0, T1: int|T0

Araq said: Yes and it's called EBase.

I figured. Exception subtyping is fine, but a universal exception supertype is not great for the same reasons why a universal object supertype is not great. For instance, without EBase you can't write a catch-all exception handler, which makes your error handling much more flexible. You could implement other error-handling semantics, like Unix-style abort()/exit() and Go's fail-fast component-level error handling (via private exceptions that aren't part of the standard exception hierarchy), all within the same error model.

Araq said: It is a "distance based" overloading resolution: (int,float) and (float,int) have the same distance to (int, int) and so produce an ambiguity.

But does this produce a definition-site ambiguity error, or a use-site ambiguity? Assuming you statically type multimethods, I assume you could use the same rules for overload resolution as you use for dispatch. Something like [1], in which overloads must form a lattice to ensure no ambiguity. What you describe above sounds like the right start, if I'm understanding you correctly.

Araw said: Nimrod's object cases are more flexible and the compiler can already issue a warning when it can't prove the field access to be valid.

A warning or an error? How do you get rid of the error/prove to the compiler that your access is safe? If you can't do that, I don't consider the object cases to be more flexible, because it makes them somewhat unsafe compated to sums.

Araq said: Not that much as we're a bit crippled here by compiling to C. I control alignment with dummy fields fwiw. In the long term I want to copy Ada's features.

Yes, Ada's pretty good for declarative layout. Have you considered compiling to Ada? Ada's more powerful type system would probably catch a number of compiler bugs that C is ignoring.

Araq said: Note how seq[TNode] and not seq[ref TNode] is supported in the object. For your example you can also use unsafeNew but this only works when 'k and 'v do not contain any GC'ed memory then as unsafeNew currently doesn't support a user defined marker proc.

So you could possibly do this unsafely with 'k and 'v containing GC'ed memory by allocating using unsafeNew, then taking a GC_ref for every entry and perform GC_unref in the Node finalizer?

[1] Type Checking Modular Multiple Dispatch with Parametric Polymorphism and Multiple Inheritance

Mirror of forum.nim-lang.org

337 :: Academic summary