nimforum mirror - Plans for improving tagged enum (ADT) syntax?

RodSteward (orginal) [2022-11-26T13:13:13+01:00] view original

While in Nim has in general a pleasant syntax, if there is one obvious area it can improve then it is the tagged enum syntax. Right now it is kind of complicated and goes against the ethos of Nim which is concise syntax.

This post explains it quite well.

https://forum.nim-lang.org/t/904#5441

As you can see the current syntax is a bit too close "to the metal" and a more abstract syntax would be nice. Rust popularized this, Swift made it more usable and tagged enums are not going away.

Example would be:

variant Foo:
  Alpha(i: int)
  Beta(f: float)
  Gamma(s: string)

let x= Foo.Beta(f: 1.0)   # or just Beta if no collisions

case x
of Alpha(i):
    echo(i: i)
of Beta(f: f):
    echo(f)
of Gamma(s: s):
    echo(s)

Another thing of improvement is adding lowering for the optional type, just like in Swift.

var x: string?

becomes

var x: Option[string]

Optional chaining similar to Swift and C#:

https://docs.swift.org/swift-book/LanguageGuide/OptionalChaining.html

In Swift, optional chaining is also just lowering.

Any plans for this?

DeletedUser (orginal) [2022-11-26T15:11:47+01:00] view original

Before someone gives the stock answer of "you can just use macros": I agree that this stuff should be easier to write. And there is no popular or standard macro library that lets you do this, despite not being very hard to implement. However the existing way of doing it in the language should remain intact and distinct from the sugar. For example case, which allows constant expressions in its of branches, should not intersect with whatever is done here, where a "pattern" expression is required.

Beyond that though there is tons of discussion on this and unfortunately I don't have a compilation of it but you can look for it at https://github.com/nim-lang/RFCs/issues?q=is%3Aissue+is%3Aopen+pattern+matching or https://github.com/nim-lang/RFCs/issues?q=is%3Aissue+is%3Aopen+variant

Sidenote: Even if object variants are too "low level" for people, there's nothing wrong with the syntax IMO, and fits the idea pretty well. Type sections in general though do have a dissonant syntax with the rest of the language, like how object needs to be indented but doesn't allow a colon, or doesn't allow semicolons between fields like named tuple types and proc arguments do, or how enums allow mixing between whitespace and commas to separate enum fields (which is nice but not pretty).

Araq (orginal) [2022-11-26T16:23:17+01:00] view original

I don't mind the syntax of case objects and it works better than Rust's and Swift's solutions, all things considered.

A macro could be added to sugar.nim for the people who disagree and want more sugar.

I'm not a fan of a macro for this because my code simply contains too few object cases for it to matter. Syntax shortcuts should exist for common things.

There is std / wrapnils for chaining nullables etc and it works better than Swift's solution IMHO.

elcritch (orginal) [2022-11-27T00:44:58+01:00] view original

I'm generally satisfied with patty macro. Its in the first nimble query you gave and provides both the declaration and a match syntax:

https://github.com/andreaferretti/patty#constructing-variant-objects

variant Shape:
  Circle(r: float)
  Rectangle(w: float, h: float)
  UnitCircle

let coord = match c:
  Circle(x: x, y: y, r: r):
    x
  Rectangle(w: w, h: h):
    h

RodSteward (orginal) [2022-11-27T01:01:27+01:00] view original

Patty looks pretty nice. Is there a chance that this can get into the standard sugar library?

xigoi (orginal) [2022-11-27T08:12:13+01:00] view original

I'd prefer something that can integrate with the existing syntax instead of requiring a standalone block.

type
  Foo {.variant.} = object
  case kind
  of Alpha:
    i: int
  of Beta:
    f: float
  of oamma:
    s: string

arnetheduck (orginal) [2022-11-27T08:49:27+01:00] view original

I don't mind the syntax of case objects and it works better than Rust's and Swift's solutions, all things considered.

there is a significant downside of nim that frequently happens when working with case object: you cannot initialize a case object with the case data only, you need to instantiate the "shell" type:


   type X = object
     case x: enum
     of valueA: a: int
     of valueB: b: int
  
  let x = valueB(b: 42) # doesn't work - needs `X(x: valueB, b: int)`

in the above, there's no way to create an X referring only to the enum and the "members" it has - you need to involve X which is problematic when X is generic - this prevents things like myVariant == valueA(a: 42) which is a significant problem when X is generic, for example Result[T, E].

Consider:


func f(): Result[int, string] =
  return Result[int, string](isOk: true, value: 42)

In the above example, it's uninteresting when returning an "ok" value what the "error" branch is - ditto comparisons and other frequently hit use cases of variant objects (the same applies to Optional in std).

This isn't "solveable" with case variants objects simply because they are overly loose: they allow members "outside" of the case, or indeed multiple case sections whereas a "pure" enum object has only one "selection point" and therefore can afford a more pleasant experience when using it.

my code simply contains too few object cases for it to matter.

This is a signal: the current case objects are not that useful due to their inherent limitations - that's why you don't see them used very often. A tagged enum like proposed above would likely see a lot more use.

radekm (orginal) [2022-11-27T09:24:43+01:00] view original

you cannot initialize a case object with the case data only, you need to instantiate the "shell" type

On the other hand the advantage of Nim case objects is that tags are first class. So you can pass only a tag to a function or change a tag if you remain in the same branch.

A tagged enum like proposed above would likely see a lot more use.

Why is that a good thing? Tag usually means you need branching => so it will be slow if you have lots of them.

Araq (orginal) [2022-11-27T09:41:57+01:00] view original

This is a signal: the current case objects are not that useful due to their inherent limitations - that's why you don't see them used very often.

No, it's because a type section is used 1390 times in the stdlib whereas proc is used 10741 times.

ElegantBeef (orginal) [2022-11-27T09:51:08+01:00] view original

This isn't "solveable" with case variants objects simply because they are overly loose

Well this is solvable if you assign a field that can only occur in a single delimited branch then it can infer the value you want to supply. Otherwise it could error the possible values. This is also not commentary on the actual types but how they're constructed, which I would argue discredits your point.

that's why you don't see them used very often

That seems like a purposely biased sentiment, that has no evidence. Almost every complex library will use them at least once, sometimes even more!

I do think object variants have an ergonomics issue, but it's mainly on the declaration. Manually creating an enum per branch and not just emitting an Enum like @xigoi has demonstrated is the main crux in my view (Ostensibly a NodeKind should be declared with a Node type...).

ElegantBeef (orginal) [2022-11-27T21:54:56+01:00] view original

therefore limit the ability to reason about them in generic code, macros, etc. This is where the lack of ergonomics comes from

I still think this is putting the cart before the horse. It's not hard to imagine a world where what you want works in a world with Nim object variants. With a change to the compiler all of the following could be valid

type X = object
  case x: enum
  of valueA: a: int
  of valueB: b: int

match X()
of X(@a):
  echo a
of X(@b):
  echo b


let x = X(b: 42)

type MyResult = Result[int, string]


func f(): MyResult = MyResult(value: 42)


type MyComplexType = object
  a, b: string
  case c: bool
  of true:
    d, e: int
  of false:
    case otherField: 0..3 # Too lazy for an enum here
    of 0, 1:
     f, g: float
    of 2:
     h: string
    else:
     discard

match MyComplexType(h: "hello") # Hey the compiler can reason this!
of MyComplexType(@h):
  echo h
of it = MyComplexType(@f, myField = @g):
  echo it, " ", f, " ", myField, " ", it.otherField
else: discard

var a = MyComplexType(f: 0) # Error 'otherField' can be `0` or `1`

Araq (orginal) [2022-11-28T08:43:40+01:00] view original

Note that constraining objects to a single case and no "extra" fields (like tagged enums do) leads to no loss of generality in what you can express

It does lead to a loss of efficiency though as in many important cases the discriminator is a single byte that can be combined into a word with some "flags" field. But that is not possible when the object is deconstructed into a tuple of sum types.

arnetheduck (orginal) [2022-11-28T09:37:57+01:00] view original

It does lead to a loss of efficiency though as in many important cases the discriminator is a single byte that can be combined into a word with some "flags" field.

How do you mean? Because of alignment, or by doing magic optimizations? The latter would have ABI implications, and there are two cases:

either you have an undefined ABI, in which case the compiler is free to generate any code it likes and then a composed object can use whatever field order and other tricks to "compact" things without violating alignment.. (ie it can even "flatten" a composed object so you get the same C structs as today)

...or you have a defined ABI, in which case the developer will have to order their fields in a particular way in order to get a desired compaction

I have a preference for the former, in general - it would be nice if objects could be tagged "abi: c" in which case they follow the (fairly) well-established C ABI, otherwise leaving the compiler to reorder and optimize as it sees fit - this ABI freedom would be a huge benefit to nlvm when it comes to efficiency tricks like this - the C backend could also do many of them.

Araq (orginal) [2022-11-28T09:56:18+01:00] view original

Due to alignment. And it's hard to gain it back because the sum type is reified and can be used independently from where it is embedded. Consider:

type
  SomeEnum = enum
    strVal, intVal, nothing
  Node = object # size: 3 words
    flags: uint8
    case e: SomeEnum # merged with flags into a machine word
    of strVal:
       s: string
     of intVal:
       i: int
     else:
       discard

vs.

type
   Branches = enum
     strVal(s: string)
     intVal(i: int)
     nothing
  
  Node = object # size: 4 words
    flags: uint8
    b: Branches

In theory you can flatten it. In practice there will be code that uses the Branches type which implies you have to unflatten it sometimes which makes the optimization much less useful.

GavinRay (orginal) [2022-12-13T00:50:06+01:00] view original

Sorry to dredge up a two-week old post, just wanted to say I think this is a very important topic

Particularly for anyone who wants to write things like interpreters, expression/query languages, etc. Having an ergonomic representation for ADT's makes a world of difference there.

arnetheduck (orginal) [2022-12-13T08:07:19+01:00] view original

In theory you can flatten it.

Not only flatten, but also reorder the fields (by size roughly) - ie these are two "common" optimizations outside of C/C++ that I think we could adopt in Nim, but that would require said ABI feature.

In practice there will be code that uses the Branches type

Compared to the status quo, this is a new capability that you gain when you have split the type - ie existing code cannot do this simply because the code is not factored that way, and if you do factor the code this way (because you want to be able to write functions for the "branches" part alone), you already have to create a separate type, and thus run into the same problem.

Basically, we can add tagged types to the language without removing case objects - the latter would serve for the special case that you outline, until we get ABI flexibility.

Araq (orginal) [2022-12-13T08:23:17+01:00] view original

Particularly for anyone who wants to write things like interpreters, expression/query languages, etc. Having an ergonomic representation for ADT's makes a world of difference there.

Well I'm one who has written things like interpreters, expression/query languages and not one "who wants to". And let me tell you: No. It does not make a world of difference when you already have case objects.

deech (orginal) [2022-12-13T15:28:13+01:00] view original

I'm not too worried about ADTs vs. case objects but I do think built in object destructuring and exhaustiveness checking with good diagnostics would actually make a lot of difference. Case statement macros can give you the first but the rest are pretty awkward.

Araq (orginal) [2022-12-13T16:46:43+01:00] view original

We have exhaustiveness checking with good diagnostics.

Mirror of forum.nim-lang.org

9659 :: Plans for improving tagged enum (ADT) syntax?