I have an object where I need to store some value. I do not know the concrete type of the value, though, only that it conforms to a given interface. Hence I have used a concept for the value.
Since this is not a concrete type, its size is unknown, and I cannot store it directly inside an object, but I can store a reference to it. The problem arises when I want to allocate memory for this reference. At that point I know the concrete type, but I could not find a way to call new that will work.
Case in point: I want to be able to compose layers in a neural network. Hence I have defined the types as follows
type
Layer*[M, N: static[int]] = concept c
forward(c, Vector64[M]) is Vector64[N]
LinearLayer*[M, N: static[int]] = object
bias: Vector64[N]
weights: Matrix64[N, M]
SequenceLayer*[M, N, P: static[int]] = object
first: ref Layer[M, N]
second: ref Layer[N, P]
What I want to express here is that I can compose a layer with M inputs and N outputs with a layer with N input and P outputs to get a layer with M inputs and P outputs. So far, so good: I am able to define forward for a LinearLayer and I can check that instances of LinearLayer are in fact instances of Layer.
proc forward*[M, N: static[int]](l: LinearLayer[M, N], v: Vector64[M]): Vector64[N] =
l.weights * v + l.bias
proc linear*[M, N](bias: Vector64[N], weights: Matrix64[N, M]): LinearLayer[M, N] =
result.bias = bias
result.weights = weights
let
l1 = linear(randomVector(6), randomMatrix(6, 8))
l2 = linear(randomVector(7), randomMatrix(7, 6))
echo l1 is Layer[8, 6] # true
echo l2 is Layer[6, 7] # true
Now, I would like to be able to compose the two layers together and get a SequenceLayer. My tentative definition goes like this:
proc `=>`*[M, N, P: static[int]](first: Layer[M, N], second: Layer[N, P]): SequenceLayer[M, N, P] =
var rfirst = new(type(first))
var rsecond = new(type(second))
result.first[] = rfirst
result.second[] = rsecond
let nn = l1 => l2
When I try to compile this, I get FAILURE: Execution failed with exit code 1 without any further indication. What would be the right way to allocate space inside =>?
Compilation is failing, without any indication why. Unfortunately, I have Nim compiled with -d:release. I will try to post the output with a non release compiler, but if you happen to have one ready, this is what I am compiling
neurotic.nimble
[Package]
name = "neurotic"
version = "0.1.0"
author = "Andrea Ferretti"
description = "Neural networks for Nim"
license = "Apache2"
bin = "neurotic"
[Deps]
Requires: "nim >= 0.11.2,linalg >= 0.1.3"
neurotic.nim
import linalg
type
Layer*[M, N: static[int]] = concept c
forward(c, Vector64[M]) is Vector64[N]
LinearLayer*[M, N: static[int]] = object
bias: Vector64[N]
weights: Matrix64[N, M]
SequenceLayer*[M, N, P: static[int]] = object
first: ref Layer[M, N]
second: ref Layer[N, P]
{. push warning[SmallLshouldNotBeUsed]: off .}
proc forward*[M, N: static[int]](l: LinearLayer[M, N], v: Vector64[M]): Vector64[N] =
l.weights * v + l.bias
proc forward*[M, N, P: static[int]](l: SequenceLayer[M, N, P], v: Vector64[M]): Vector64[P] =
l.second.forward(l.first.forward(v))
{. pop .}
proc `=>`*[M, N, P: static[int]](first: Layer[M, N], second: Layer[N, P]): SequenceLayer[M, N, P] =
var rfirst = new(type(first))
var rsecond = new(type(second))
result.first[] = rfirst
result.second[] = rsecond
proc linear*[M, N](bias: Vector64[N], weights: Matrix64[N, M]): LinearLayer[M, N] =
result.bias = bias
result.weights = weights
when isMainModule:
let
v = randomVector(8)
l1 = linear(randomVector(6), randomMatrix(6, 8))
l2 = linear(randomVector(7), randomMatrix(7, 6))
# nn = l1 => l2
echo l1 is Layer[8, 6]
echo l2 is Layer[6, 7]
echo l2.forward(l1.forward(v))
If you decomment, the commented line, the compiler crashes
Well, then I am not sure how to implement what I am trying to do here. What I would like to model is layers of neural networks and combinators among them.
A layer will take a Vector64[M] as input and output a Vector64[N] for some fixed M and N (let me forget for a minute about backpropagation). I would like to have many layer implementations (linear layers, convolutional layers, various nonlinearities...) provided they conform to a common interface, which for now is given by the existence of just the forward function.
I also would like to combine two layers together - when the dimensions match - in order to form their composition (and possibly have some more complex combinators). To do so, my plan was to use a type SequenceLayer that would essentially just remember the two layers and apply them in turn to the input vector.
But I need to store those layers somewhere. I also do not know a priori what layers will be combined. Hence my definition
type SequenceLayer*[M, N, P: static[int]] = object
first: ref Layer[M, N]
second: ref Layer[N, P]
If this strategy has no chance to work, how would I go about implementing layer composition? At this moment I do not have any good idea.
By the way, I think the same issue would happen in any situation where one wants to create an algebra of combinators, be it for parsing, composing HTTP routes and so on.
@Varriount: the reason is the same as in my other answer. Actually, all dimensions are known and inferred at compile time.
Also, I need to work with actual arrays instead of sequences because I the linalg library passes them to BLAS operations behind the scenes, so I need the storage to be actually contiguous
@Varriount: thank you, I did not know. I assumed that sequences were implemented as a linked list of big blocks or something like that. Still, the other reasons remain (having help from the compiler in tracking down dimensions).
@Araq: thank you, I will see whether I am able to store those inside a closure
andrea: Trying to have all length calculations predetermined at compile time is going to be very difficult, if not impossible. Take for example:
# Populate a sequence with layers
var layerSeq: newSeq[ref BaseLayer]()
layerSeq.add(newLayer(1, 20))
layerSeq.add(newLayer(107, 54))
layerSeq.add(newLayer(30, 30))
# Shuffle the sequence
layerSeq.shuffle()
layerSeq[0] # Ok, what are the static integers in our layer?
The moment non-deterministic behavior is introduced, or a code path with a large number of branches, static information is either going to be lost, or the compiler will have to generate an increasing amount of backend code. Using closures only going to implicitly do what would normally be explicitly done: store the lengths of the arrays during runtime. This is because (current) closures work via a procedure-pointer/environment-pointer pair, with the environment pointer leading to a structure that stores all the outer variables and captured information.
@Varriount: yes, of course it will not be possible to shuffle layers of the network.
This is not really an issue: usually when constructing a neural network the structure is known and fixed before starting the training. I will not even expose a way to create networks dynamically via a sequence of layers (although an array of homogeneous layers should work, for instance when unrolling a recurrent neural network).
Actually I am not sure why you are trying to discourage me to use static information. I thought the whole point of static[T] was to allow to encode more complex invariants inside types, as in dependently typed languages. In fact, this is the killer feature that attracted me to Nim in the first place.
I already started developing a linear algebra library that makes use of this, and I am in the process of making it work on the GPU via Cuda. If you look at the tests you can see that is already working quite well. The types are being inferred, invalid examples do not compile and the syntax is no heavier than numpy (at least with 64 bits).
On top of that, I am planning to write a neural network library and a clustering library.
Of course, you are more well-informed that me on the evolution of Nim. For instance, if there are plans to deprecate static[T], I would like to know early, because I rely on that feature a lot. Otherwise, I am quite happy of how things are going, and I do not have plans to rewrite everything keeping track of lengths dynamically
Actually for some use cases they may make sense now as well, as resizing is always constant time, while relocatable arrays require copies once in a while
Ok, but if the size of the vector is doubled as it grows, the cost to copy is only O(nlogn).
For instance, if there are plans to deprecate static[T], I would like to know early, because I rely on that feature a lot.
No, don't worry. static[T] is only going to get more stable.
I tried to look at this issue a bit, and I think I found some useful testcases:
This code gives internal error.
type
SomeConcept = concept c
dosomething(c)
TObj[N: static[int]] = object
x: array[N, int]
var
o: ref TObj[3]
o2: ref SomeConcept
new(o)
proc dosomething(f: TObj) =
echo "something ", len(f.x)
proc metado*(f: SomeConcept) =
dosomething(f)
metado(o)
And this code compiles, but probably shouldn't:
type
SomeConcept[P: static[int]] = concept c
dosomething(c)
TObj[N: static[int]] = object
x: array[N, int]
TCar[T: static[int]] = object
y: ref SomeConcept[T]
var
o: ref TObj[3]
new(o)
proc dosomething(f: ref TObj) =
echo "something ", len(f.x)
proc metado*(f: SomeConcept) =
dosomething(f)
metado(o)
I'll create a GitHub issue unless it's already covered by one.
andrea: You want everything to compile statically, right? That means you should know at compile time what types a given SequenceLayer contains. You don't have to store a concept, you just need to store some known type that you want to conform to some concept.
That is, you'd like to be able to do something like this:
SequenceLayer*[F: Layer[M,N], S: Layer[N,P]] = object
first: F
second: S
If this were possible, it would be incredibly cool. You'd have the whole structure of your neural network encoded in the type information of the root type of the neural net ^_^
Unfortunately, this doesn't seem possible atm. For me it fails with "Error: undeclared identifier: 'M'"
Is there some other way to achieve this, or will it be possible in the future?
type SequenceLayer*[M, N, P: static[int], F: Layer[M,N], S: Layer[N,P]] = object
first: F
second: S
proc `=>`*[M, N, P: static[int], F: Layer[M,N], S: Layer[N,P]](first: F, second: S): SequenceLayer[M, N, P, F, S] =
result.first = first
result.second = second
One then would hope that types are inferred, since writing this out would be rather cumbersome.
Unfortunately, the compiler cannot infer (at least trivially) the types in this situation. In fact, say F is LinearLayer[6, 7]. Then we know (and the compiler knows as well) that F belongs to Layer[6, 7], but there is no reason F could not also belong to Layer[M, N] for other values of M and N. Hence the compiler cannot infer the static[int] that are type parameters of Layer.
What we could do is guide the compiler inference. Something like this
proc dimIn[M, N: static[int]](l: LinearLayer[M, N]) = M
proc dimOut[M, N: static[int]](l: LinearLayer[M, N]) = N
and then try to use these values as parameters
proc `=>`*[F, S](first: F, second: S): SequenceLayer[dimIn(first), dimOut(first), dimOut(second), F, S] =
result.first = first
result.second = second
This also fails, because the type parameters for the result depend on the input values! Actually it only depends on its type, but the compiler does not know that.
At this point I am stuck. I do not know how to express dimIn and dimOut as depending only on the type.
I think I will try to use the object-oriented features of Nim instead of concepts to see if I am able to overcome this.
(By the way, once I finish adding CUDA operations to linalg, I will also add dynamically sized vectors, and hence dynamically sized layers, which are useful especially for input layers, where the dimension may not be know a priori. But I would still like to pursue the direction of being able to express statically sized layers)
This error seems reasonable: M, N and P are unbound in your signature.
That's true, but couldn't the compiler be smart enough to understand that M is unbound, and then bind it to a type the first time it sees it? Then, the next time it sees M, it could check that they are the same type.
I think there's a real need to be able to deal well with nested generic types.
A similar solution would be something like:
(where * would be some way of telling the compiler that S's N does not have to match anything)
It really shouldn't be necessary to pass M,N and P as type parameters. If you're instantiating SequenceLayer with a Layer, then M and N is already specified by Layer. The only other thing the compiler needs to be able to do is verify that S.N is equal to F.M
In this particular case, there is no way to get F.M, because F could be (a priori) an instance of Layer[M, N] for more than one value of M and N. I know I will give only one overload of forward per layer type, so I will guarantee that M and N are uniquely determined, but I am not stating this anywhere in the types.
This is why I think there is the need to uniquely associate to a given layer type the values of M and N, and this is what I am getting at with the dimIn and dimOut functions. I just do not know how to express a partial function from typedesc to int.