nimforum mirror - Modeling probability distributions

andrea (orginal) [2016-11-23T16:53:05+01:00] view original

Hi, I am trying to write a library to work with probability random variables.

It is easy to define what a random variable is using concepts: sampling from a random variable should return a value of a certain type, chosen according to a random input.

This is why I am using BlaxSpirit random library.

In this library, a random number generator is defined as a concept as well. I am not sure how to define concepts using concepts, so for now I will stick with a concrete random number generator.

The definition I would like to give is then

type RandomVar[A] = concept x
  var rng = initMersenneTwister(urandom(16))
  rng.sample(x) is A

The library should then provide some basic random variables and some means to combine them in more complex ones. In particular, I would like to be able to lift functions on values to functions on random variables: if I have a function f: (A, B) -> C, I can easily get a g: (RandomVar[A], RandomVar[B]) -> RandomVar[C]. Whenever I need to sample from g(a, b), I can sample from a, sample from b and then apply f.

Now, I can define some basic random variables and it works nicely:

type
  Constant[A] = object
    value: A
  Uniform = object
    a, b: float
  Discrete[A] = object
    values: ref seq[A]

proc sample[A](rng: var RNG, c: Constant[A]): A = c.value

proc sample(rng: var RNG, u: Uniform): float = u.a + (u.b - u.a) * rng.random()

proc sample[A](rng: var RNG, d: Discrete[A]): A = rng.randomChoice(d.values[])

when isMainModule:
  var rng = initMersenneTwister(urandom(16))
  let
    c = constant(3)
    u = uniform(2, 18)
    d = choose(@[1, 2, 3])
  
  echo(c is RandomVar[int]) # true
  echo(u is RandomVar[float]) # true
  echo(d is RandomVar[int]) # true
  
  echo rng.sample(c)
  echo rng.sample(u)
  echo rng.sample(d)

The problem arises when trying to combine random variables. Let me stick to functions of a single input. For this I need random variables represented as closures.

I can define

type
  MyRNG = type(initMersenneTwister(urandom(16)))
  ProcVar[A] = object
    extract: proc(rng: var MyRNG): A

proc sample[A](rng: var MyRNG, p: ProcVar[A]): A = p.extract(rng)

To lift a single variable function, I can define

proc lift1[A, B](p: proc(a: A): B): proc(a: RandomVar[A]): ProcVar[B] =
  proc res(a: RandomVar[A]): ProcVar[B] =
    proc extract(rng: var MyRNG): B =
      let a1 = rng.sample(a)
      return p(a1)
    
    result.extract = extract
  
  return res

to be used like

proc sq(x: float): float = x * x

let s = lift1(sq)
let x = s(u)

but this does not compile, since RandomVar is a concept. I can just use ProcVar in place of it, like this:

proc lift1[A, B](p: proc(a: A): B): proc(a: ProcVar[A]): ProcVar[B] =
  proc res(a: ProcVar[A]): ProcVar[B] =
    proc extract(rng: var MyRNG): B =
      let a1 = rng.sample(a)
      return p(a1)
    
    result.extract = extract
  
  return res

but then to pass the other random variables to it, I need a converter

converter toProcVar[A](r: RandomVar[A]): ProcVar[A] =
  proc extract(rng: var MyRNG): A = rng.sample(r)
  
  result.extract = extract

If I use this converter, the compiler goes into an infinite loop. If I change the converter into a proc and try to use it explicitly like this

let
  s = lift1(sq)
  x = s(toProcVar[float](u))

the compiler just crashes.

What approach would you suggest to make the whole machinery work reasonably?

Krux02 (orginal) [2016-11-24T13:08:10+01:00] view original

I do not have your random library installed, so I cannot really test what I have just written, but I would do something like this:

type
  RandomVar[A] = proc(rng: var RNG): A

proc uniform(rng: var RNG; a,b: float) : float =
  a + (b - a) * rng.random()

proc c(rng: var RNG) : int   = 3
proc u(rng: var RNG) : float = rng.uniform(2,18)
proc d(rng: var RNG) : int   = rng.randomChoise([1,2,3])

when isMainModule:
  var rng = initMersenneTwister(urandom(16))
  
  echo(c is RandomVar[int]) # true
  echo(u is RandomVar[float]) # true
  echo(d is RandomVar[int]) # true
  
  echo rng.c
  echo rng.u
  echo rng.d

I think it's much more flexible, because now you can compose with arbitrary functions, and you do not need to create new types for each typo of composition.

andrea (orginal) [2016-11-25T11:54:59+01:00] view original

Hi Krux02, in order to try it out it is enough to depend on the linked library in your nimble file. One such as this would do:

# Package

version       = "0.1.0"
author        = "Andrea Ferretti"
description   = "Playground"
license       = "Apache2"

bin = @["example"]

# Dependencies

requires "nim >= 0.15.0", "random >= 0.5.3"

More to the point, in your example all three assertions return false. :-( I guess that this has to do with the fact that RNG is itself a concept.

Still, the problem is not how to define these particular basic distributions. The problem that I have is how to lift functions on values to functions on distributions.

For instance, how do I write the square of a distribution? I should be some form of closure, but I have not been able to write any form that actually compiles...

Krux02 (orginal) [2016-11-27T20:32:50+01:00] view original

So when I understood you correctly, you want to have a symbolic representation of the distribution, so that you can do symbolic transformations on them, but I am not sure about it. Then you should specify, if your distributions are is something known at compile time, or something that depends on data that is not known at compile time. This is a very important difference, because my solution I just posted can not handle distributions that are not known at compile time, it can only process distributions defiend as functions at compile time. After compilation it is not that easy to add a new function to the system itself, and then it starts that you build your own virtual machine, and that is a lot more complex.

andrea (orginal) [2016-11-28T10:43:16+01:00] view original

It is actually pretty easy to add functions at runtime: closures are there just for that! :-) Of course, not actual new functions, but still parametrized family of functions.

What I would like to do is just being able to manipulate random variables in the natural way. My ideal library would be usable like this:

var rng = ...
let
  a = uniform(-10, 10)
  b = gaussian(mean = 0, stddev = 2)
  c = abs(a + b)

echo rng.mean(c)
echo rng.sample(c)

For this to work I have to lift unary (abs) and binary (+) functions to distributions and so on.

Syntactic sugar aside, it is just what is called the probability monad in other programming languages

Mirror of forum.nim-lang.org

2632 :: Modeling probability distributions