nimforum mirror - Object variants: private discrimiator for safety's sake

twetzel59 (orginal) [2018-01-31T01:27:11+01:00] view original

One of the things that I currently find very concerning about object variants is how they can introduce runtime errors when the discriminator is assigned to in an incorrect manner. As it turns out, there seems to be a compiler bug that causes not only exceptions, but undefined behavior when the discriminator is changed. Also see this RFC.

I always prefer catching these sorts of problems at compile time, and it occurred to me that making the discriminator private could offer a solution in some cases.

type
  Variants* = enum
    vaCow, vaPig, vaHorse
  
  MyObj = object
    case kind: Variants
    of vaCow:
      age*: float
    of vaPig:
      hunger*: Natural
    of vaHorse:
      name*: string

proc newMyObj*(age: float): MyObj =
  MyObj(kind: vaCow, age: age)

proc kind*(o: MyObj): Variants =
  o.kind

So, do you think that this is an idomatic/Nim-esque solution? I feel that it solves the problem of safety when the "kind" will be determined by a constructor proc and remain constant, as in the example.

I am tempted to use this pattern often, as I highly value memory safety. Will I encounter pitfalls/confusion with this, or is it an acceptable workaround?

cdome (orginal) [2018-01-31T18:18:53+01:00] view original

IMO, Nim forbids assignment to discriminant fields anyway. Check is disabled in the release mode, but it is on by default in debug.

twetzel59 (orginal) [2018-02-01T01:26:35+01:00] view original

There seems to be a bug, however, by which this check doesn't catch such an error, even in debug mode.

For example, try the following example.

type
  Kind = enum
    Zeroth,
    First,
  
  Thing = object
    case kind: Kind
    of Zeroth:
      f: float
    of First:
      i: int

var t = Thing(kind: Zeroth, f: 3.14)

t.kind = First
echo t.i

The code erroneously interprets the float as an int! This is a major safety issue.


Output: 4614253070214989087

I believe that this is a bug (see above issues), because the inverse situation:

var t = Thing(kind: First, i: 123)

t.kind = Zeroth
echo t.f

raises an exception as it should. In fact, I believe (don't quote me on this, but I think I'm understanding correctly), that this is caused by the FieldDiscriminantCheck proc in system/assign.nim!

The proc is fairly simple:

proc FieldDiscriminantCheck(oldDiscVal, newDiscVal: int,
                            a: ptr array[0x7fff, ptr TNimNode],
                            L: int) {.compilerProc.} =
  var oldBranch = selectBranch(oldDiscVal, L, a)
  var newBranch = selectBranch(newDiscVal, L, a)
  if newBranch != oldBranch and oldDiscVal != 0:
    sysFatal(FieldError, "assignment to discriminant changes object branch")

So this would explain how this bug comes about. The first listed enum variant will be encoded as 0. Here this is Zeroth. When attempting to change the branch from First to Zeroth, both conditions pass, and the exception is properly raised. Yet, in the broken example, the second condition fails.

I am not sure why this condition is here. I assume that there is some significance of a discriminant of 0, but it seems to cause a conflict with the enum encoding, and it definitely breaks memory safety.

This may be really easy or really hard to fix...

Araq (orginal) [2018-02-01T09:57:54+01:00] view original

It is reasonably easy to fix but adds 1 byte to case objects. The cause is that code like n.kind = First; t.i = 8 needs to work, so transitions from the "zero" state have be be allowed. Alternatively we can require full object constructions and disallow all assignments to the discriminator. This would be pretty much how Rust solves this issue. The cost is flexibility.

cdome (orginal) [2018-02-01T12:02:21+01:00] view original

Just as an idea, the flexible solution would be to able set discriminant if you provide the new value simultaneously with discriminant and reject at compile time otherwise.

Something like:

var t = Thing(kind: Zeroth, f: 3.14)

t.kind = First # not compiles
t.i = 12 # fails runtime check, compile time warning warnProveField


(t.kind, t.i) = (First, 12) # successful

If it is difficult to achieve just disallow all assignments to the discriminant.

cdome (orginal) [2018-02-01T13:11:22+01:00] view original

@twetzel59, there is extra warning you can enable that you can find useful: warnProveField

Araq (orginal) [2018-02-01T18:40:55+01:00] view original

Another solution is to allow object case transitions when the object consists of binary zeros which is what reset does. reset is documented to enable object case transitions. This would not require a hidden wasInitialized bool.

twetzel59 (orginal) [2018-02-01T20:14:48+01:00] view original

That could work. I would be concerned about something though: will binary zero always mean that it is safe to change branches? It is possible that a varient may be perfectly valid as zero, even when initialized. An example could explain better:

type
  Variant = enum
    Point, NamedPoint
  
  Point2d = object
    case kind: Variant
    of Point:
      x, y: float
    of NamedPoint:
      name: string
      nx, ny: float

Just having the point at (0, 0) doesn't mean that the programmer intends for a Point to be reinterpreted as a NamedPoint. In fact, if you have, for example, some game entities, then those at the origin would have different assignment properties. I'm afraid that this could be surprising, if not dangerous. So, I like the idea, but I fear that it could create unexpected corner cases.

I have an idea, and although it's a bit silly/simple, it could be advisable. Maybe we could urge developers to have a "nil case" that has no fields, especially in more critical code. That would place the burden on the programmer, but it's also very deliberate. The object could only be initialized into this state, and then the branch would be set in stone until reset:

type
  Variant = enum
    None, Point, NamedPoint
  
  Point2d = object
    case kind: Variant
    of None: discard
    of Point:
      x, y: float
    of NamedPoint:
      name: string
      nx, ny: float

var p = Point2d(kind: None)
p.kind = Point
p.x = 3.0
p.y = 2.0

echo p.x, ' ', p.y

# Throws FieldError, as intended.
p.kind = NamedPoint

This seems like a workaround in some ways, though, and current code does not do this. It would be better if the language could enforce the variants without such a construction. I do understand the argument about flexibility. Nim developers are almost certainly going to assign to different fields, including the discriminant, after obtaining a new, zeroed object.

Mirror of forum.nim-lang.org

3515 :: Object variants: private discrimiator for safety's sake