nimforum mirror - Plans regarding named tuples

bluenote (orginal) [2017-02-27T09:11:13+01:00] view original

This is probably mainly a question for Araq, but maybe others are adept in the plans as well. Araq mentioned in a pull request that named tuples might be removed from the language. I'm wondering what is the motivation for this, and what will be the consequences? Will there be an equivalent mechanism that allows ad-hoc type creation, i.e., named tuples would be replaced by something like anonymous objects? Or are there other ways to handle this without breaking existing code?

I'm asking because what I'm currently working on heavily relies on named tuples (they are a key language feature to achieve my mission with NimData: type-safe schema transformation), and I'm not sure if the DSL I'm designing would work without them.

Araq (orginal) [2017-02-27T09:28:58+01:00] view original

Well object is the obvious replacement, but this indeed does not cover all the use cases of named tuples. I like to remove named tuples to simplify the language. Maybe this means object needs to grow minor new capabilities.

I'm not sure if the DSL I'm designing would work without them.

Please elaborate.

bluenote (orginal) [2017-02-27T20:58:40+01:00] view original

Please elaborate.

I'm mainly aware of the situation in Scala, where the lack of named tuples is the reason why type-safe schema transformation is rather limited. When working with typed data, there are basically two options:

Use unnamed tuples, which is not really an option, because it either hard-codes column positions (=> unreadable) or requires tedious pattern matching over all fields.

Use (case) classes, which is the standard solution: You write out a case class, which involves typing out all the field names/types once. The problem is that transforming the data cannot be done automatically. Let's assume the input data has 30 columns, so we have to write a first class RawInput with 30 fields. In a later processing step we might want to remove a few columns. This again requires to define a new class ReducedInput with 20+ fields. Eventually we might want to add a bunch of derived columns, and again we have to introduce a new type. The problem can be mitigated by inheritance/traits, but it still remains a work-around which is not very convenient to work with.

In Nim, the same can be solved very elegantly by just transforming/constructing named tuples everywhere. That's what the DSL looks like:

# A const schema definition is required once. Ideally this is the
# only point where we have to type out our 30 columns.
const schema = ... # array with field information

# For here on, it is just a bunch of macros performing named tuple transformations
let df = DF.fromText("test.csv")
           .map(schemaParser(schema, ";"))

# Projection can use whichever is shorter to type
df.map(t => t.projectAway(fields, to, remove))
df.map(t => t.projectTo(fields, to, keep))

# Adding new fields also does not require repeating existing fields
df.map(t => t.addFields(length: sqrt(t.x^2, t.y^2))

# Eventually even the schema of a join can be computed statically:
let joined = dfA.join(dfB, on=[joinField])

This should also play nicely with structural typing in Nim, e.g., passing data frames to functions can be done generically, and does not require to write out field names explicitly.

I'm not sure how this would work with objects. Since they are nominal, I guess they would have to be made explicitly available in the outer scope. Currently I leave it up to the user if they want to define their types explicitly, for instance via this macro:

type
  MyRowType = schemaType(schema)

proc myExplictlyTypedProc(df: DataFrame[MyRowType]) = ...

What I wanted to avoid is that a user has to explicitly name their types for each transformation.

bluenote (orginal) [2017-02-27T21:28:33+01:00] view original

Ah, maybe my assumptions were just wrong. All I knew was that something like this is not possible:

# can't use `Anonymous` as return type
proc test(): Anonymous =
  type Anonymous = object
    x: int
    y: int
  result = Anonymous(x: 0, y: 0)

But it looks like objects can already be anonymous, because this seem to work:

proc test(): auto =
  type Anonymous = object
    x: int
    y: int
  result = Anonymous(x: 0, y: 0)

Not being able to write out the type explicitly feels weird though. And things get messy when structural equivalence matters:

proc testA(): auto =
  type Anonymous = object
    x: int
    y: int
  result = Anonymous(x: 0, y: 0)

proc testB(): auto =
  type Anonymous = object
    x: int
    y: int
  result = Anonymous(x: 0, y: 0)

# this works
echo testA() == testA()

# this doesn't
echo testA() == testB()

So maybe objects aren't too far away to replace named tuples already. But I have to say that I still like the simplicity of named tuples a lot and will surely miss the nice syntax ;).

dom96 (orginal) [2017-02-27T21:29:31+01:00] view original

What is the rationale for removing named tuples?

bpr (orginal) [2017-02-27T21:53:29+01:00] view original

What is the rationale for removing named tuples?

Mentioned above, to simplify the language. They're largely redundant with objects.

I think that method should get the boot too, but that might be much.

Krux02 (orginal) [2017-02-28T14:37:48+01:00] view original

Well, I guess named tuples have to be replaced by anonymous objects then. I am not sure if that changes anything.

Mirror of forum.nim-lang.org

2823 :: Plans regarding named tuples