The algorithm iterates 3790200 times and produces a factor of 535006138814359
Using the BigInt library: 106.3 seconds Using the BigNum library: 1.38 seconds Using the OCaml language with zarith: 1.2 seconds Using the F# language : about 15 seconds
BigInt allows coding that closely follows the algorithm: x = (x*x + c) mod n # all variables are BigInts
BigNum is less readable, but acceptable to me: x = `mod`(x ,add(t2, mul(t1,x,x), c), n) or discard `mod`(x ,add(t2, mul(t1,x,x), c), n)
I find the discard statement ugly and the alternative only adds about .01 seconds to the run. t1 and t2 are BigInt temporary variables that are not used since add and mul produce a return value.
So my conclusion is the BigInt is not suitable for problems of the rho type with big numbers to factor, but is very readable and suitable for less intensive problems.
Maybe also of interest:
https://github.com/mratsim/constantine/tree/master/constantine/math/arithmetic
If I compile bigints with
nim c -d:release --passC:-flto --passL:-s --mm:arc rho1.nim
I get improved timings of:
real 1m24.066s
user 1m24.060s
sys 0m0.000s
Heya, I don't really have anything to contribute but I recently uploaded https://fsh.github.io/integers/integers.html which are intended to be ergonomic integers (wrapping GMP) and seems kind of maybe-possibly-peut-être adjacent to what you're looking for.
No guarantees re polish or bugs or other platforms (I make some low-end assumptions), and as far as I know I've been the only user.
As for opinion pieces on bigint libraries in general, I also wrote a little note on the rather sad state of GMP vs "anything else" at https://github.com/fsh/integers#why-gmp
As for opinion pieces on bigint libraries in general, I also wrote a little note on the rather sad state of GMP vs “anything else” at https://github.com/fsh/integers#why-gmp
Why GMP?
Unfortunately there is just no beating GMP when it comes to performance. It is the unequivocal gold standard.
All “home-made” or “from scratch” integer implementations by hobbyists, such as Nim’s official bigints (and even Python’s built-in int) are laughably slow compared to GMP, often on the order of magnitudes.
Alternative projects such as libtomath come much closer, but are still not quite as optimized as GMP.
The reason projects shy away from GMP is two-fold:
- The “serious” reason is that GMP is dual-licensed with LGPLv3 and GPLv2, and many consider even the LGPL too restrictive or convoluted (e.g. static linking can be problematic), so this makes it simply a no-go for many projects.
- The less practical reason is that implementing one’s own arbitrary precision integers from scratch is (at the onset) a very attractive and exciting project for many programmers, so it is a common mind trap to fall into.
The first reason is unavoidable and unfortunate. The second reason is very understandable (I feel the same pull), although it is incredibly frustrating when I’m stuck with these cowboy implemenations as a user.
Since I am of the opinion that (big) integers is a core data type that should be universally available, thus also implicitly demanding it should be as efficient as humanly possible, I’m always going to bite the bullet on the (L)GPL license.
Being as fast as GMP is a three-fold issue:
But it is doable, for example for size suitable for cryptography my library 3.8x faster than GMP on basic big int multiplication:
The ratio is probably x8 on modular multiplication.
How?
C code for add with carry on 256-bit
#include <stdint.h>
#include <x86intrin.h>
void add256(uint64_t a[4], uint64_t b[4]){
uint8_t carry = 0;
for (int i = 0; i < 4; ++i)
carry = _addcarry_u64(carry, a[i], b[i], &a[i]);
}
GCC 9.2 ASM
add256: movq (%rsi), %rax addq (%rdi), %rax setc %dl movq %rax, (%rdi) movq 8(%rdi), %rax addb $-1, %dl adcq 8(%rsi), %rax setc %dl movq %rax, 8(%rdi) movq 16(%rdi), %rax addb $-1, %dl adcq 16(%rsi), %rax setc %dl movq %rax, 16(%rdi) movq 24(%rsi), %rax addb $-1, %dl adcq %rax, 24(%rdi) ret
Clang 9.0 ASM
add256: movq (%rsi), %rax addq %rax, (%rdi) movq 8(%rsi), %rax adcq %rax, 8(%rdi) movq 16(%rsi), %rax adcq %rax, 16(%rdi) movq 24(%rsi), %rax adcq %rax, 24(%rdi) retq
Now porting my cryptographic optimizations to a generic arbitrary precision libary is something I started in https://github.com/SciNim/theo however then you get new time consuming engineering challenges beyond just the algorithmic ones, for example for add: https://github.com/SciNim/theo/blob/9172900/theo/op_addsub.nim#L27-L35
func add*(r {.noalias.}: var BigInt, a, b: BigInt) =
## BigInt addition
# TODO:
# - relax the aliasing constraint
# - dispatch on fixed size add for add that fits in registers
# and build add recursively
# to avoid loop counters resetting carry chains.
# - Support negative inputs
# - Support compile-time
For multiplication, you need then to implement in level of overhead/acceleration: product scanning (comba) multiplication, Karatsuba multiplication, Toom-Cook multiplication (now that's if we have thousands of digits so stretch goal), FFT multiplication.
For division, expect days of debugging, everyone gets it wrong, even Google in Go and Javascript: https://skanthak.homepage.t-online.de/division.html
Now if someone is interested in implementing a fast bigint library for Nim, I can guide them, lots of engineering challenges were already solved in Constantine.
Heya mratsim! When I first got into Nim, you're the person that kept coming up time and time again for the stuff I was searching--including feature requests--so I think we have a lot of overlapping interests (even go, judging by your github picture).
But it is doable, for example for size suitable for cryptography
Of course, I totally agree. I didn't say it outright in my note, but I was implicitly thinking of (and being frustrated with) the general use cases, such as Python's homemade bigints, etc. Having statically known sizes is a huge benefit when working with 𝔽_q or elliptic curves for specific applications, naturally, tho then again one could also argue the low level mpn_* API is part of GMP, and that could probably be used in the statically sized case for big speed-ups. Though getting the things to inline properly might be tricky, so that's a potential barrier. (Also, worth a note: in the general case, one potentially big thing I don't think have been taken to its full potential is something like what C++ does with short-string optimization. I was thinking of trying something like that myself at some point.)
But then you might want your library to work at compile-time as well
This is a specific topic I'm very interested in personally. A big frustration I have with extant languages is how most of the techniques for manipulating the boundary between comptime and runtime -- generics, static[], specialization of such, etc -- is invariably pretty limiting/buggy to work with compared to what I think it could/should be. I've been toying with some proof-of-concepts (compiling to Zig instead of C) to explore the design space here. I want to be able to write code for Matrix(T: type): type which has run-time rows/cols fields in its representation, and then be able to _specialize the type itself by assigning static comptime values to some fields -- also trying to figure out how to promote slices into static arrays etc when possible. The idea is for fields to be automatically lifted out of the runtime representation into the comptime representation like we currently would write Matrix[n: static[..], m: static[..], T]. I absolutely hate having to write code for the same algorithm twice -- one using runtime parameters, possibly branching on those to more specialized cases, and then again in order to use const generics etc.
for division, expect days of debugging, everyone gets it wrong, even Google in Go and JavaScript
Incidentally, it was recently found out that bigints has a bug in its division implementation: https://github.com/nim-lang/bigints/issues/123
A big frustration I have with extant languages is how most of the techniques for manipulating the boundary between comptime and runtime -- generics, static[], specialization of such, etc -- is invariably pretty limiting/buggy to work with compared to what I think it could/should be. I've been toying with some proof-of-concepts (compiling to Zig instead of C) to explore the design space here.
For bigint, Stint works and is tested at compile-time: https://github.com/status-im/nim-stint/blob/f366239/tests/test_uint_addsub.nim#L201-L202
But I don't think you can do the same with an arbitrary precision lib, detecting aliasing is very important for optimization to understand which buffers to use/reuse and there is no pointer/address at compile-time.
I want to be able to write code for Matrix(T: type): type which has run-time rows/cols fields in its representation, and then be able to _specialize the type itself by assigning static comptime values to some fields -- also trying to figure out how to promote slices into static arrays etc when possible.
I don't see where this is worthwhile: