This is probably mainly a question for Araq, but maybe others are adept in the plans as well. Araq mentioned in a pull request that named tuples might be removed from the language. I'm wondering what is the motivation for this, and what will be the consequences? Will there be an equivalent mechanism that allows ad-hoc type creation, i.e., named tuples would be replaced by something like anonymous objects? Or are there other ways to handle this without breaking existing code?
I'm asking because what I'm currently working on heavily relies on named tuples (they are a key language feature to achieve my mission with NimData: type-safe schema transformation), and I'm not sure if the DSL I'm designing would work without them.
Well object is the obvious replacement, but this indeed does not cover all the use cases of named tuples. I like to remove named tuples to simplify the language. Maybe this means object needs to grow minor new capabilities.
I'm not sure if the DSL I'm designing would work without them.
Please elaborate.
Please elaborate.
I'm mainly aware of the situation in Scala, where the lack of named tuples is the reason why type-safe schema transformation is rather limited. When working with typed data, there are basically two options:
In Nim, the same can be solved very elegantly by just transforming/constructing named tuples everywhere. That's what the DSL looks like:
# A const schema definition is required once. Ideally this is the
# only point where we have to type out our 30 columns.
const schema = ... # array with field information
# For here on, it is just a bunch of macros performing named tuple transformations
let df = DF.fromText("test.csv")
.map(schemaParser(schema, ";"))
# Projection can use whichever is shorter to type
df.map(t => t.projectAway(fields, to, remove))
df.map(t => t.projectTo(fields, to, keep))
# Adding new fields also does not require repeating existing fields
df.map(t => t.addFields(length: sqrt(t.x^2, t.y^2))
# Eventually even the schema of a join can be computed statically:
let joined = dfA.join(dfB, on=[joinField])
This should also play nicely with structural typing in Nim, e.g., passing data frames to functions can be done generically, and does not require to write out field names explicitly.
I'm not sure how this would work with objects. Since they are nominal, I guess they would have to be made explicitly available in the outer scope. Currently I leave it up to the user if they want to define their types explicitly, for instance via this macro:
type
MyRowType = schemaType(schema)
proc myExplictlyTypedProc(df: DataFrame[MyRowType]) = ...
What I wanted to avoid is that a user has to explicitly name their types for each transformation.
Ah, maybe my assumptions were just wrong. All I knew was that something like this is not possible:
# can't use `Anonymous` as return type
proc test(): Anonymous =
type Anonymous = object
x: int
y: int
result = Anonymous(x: 0, y: 0)
But it looks like objects can already be anonymous, because this seem to work:
proc test(): auto =
type Anonymous = object
x: int
y: int
result = Anonymous(x: 0, y: 0)
Not being able to write out the type explicitly feels weird though. And things get messy when structural equivalence matters:
proc testA(): auto =
type Anonymous = object
x: int
y: int
result = Anonymous(x: 0, y: 0)
proc testB(): auto =
type Anonymous = object
x: int
y: int
result = Anonymous(x: 0, y: 0)
# this works
echo testA() == testA()
# this doesn't
echo testA() == testB()
So maybe objects aren't too far away to replace named tuples already. But I have to say that I still like the simplicity of named tuples a lot and will surely miss the nice syntax ;).
What is the rationale for removing named tuples?
Mentioned above, to simplify the language. They're largely redundant with objects.
I think that method should get the boot too, but that might be much.