nimforum mirror - Show Nim: JWTea, Crunchy and Depot. New repos you may find useful.

guzba (orginal) [2023-01-25T01:59:14+01:00] view original

Hey all, today I wanted to do a quick post about three repos I've recently published and explain why you may find them useful.

JWTea

https://github.com/guzba/jwtea

This repo creates JSON Web Tokens and does the RSA signing in Nim (no OpenSSL or other external deps).

If you don't need JWT, you don't need this repo. If you do, such as for Google APIs, this repo is wonderful.

The RSA private key decoding and signing are done by Crunchy. More about Crunchy below. Doing all RSA in Nim means one less reason to even think about OpenSSL or other things. OpenSSL is pain. Others have gone to great effort to use BearSSL, perhaps to avoid OpenSSL pain. That is very impressive work. For me though, just using Nim is a more in line with my goals.

Crunchy

https://github.com/guzba/crunchy

Crunchy has SIMD / intrinsic accelerated procs for CRC-32, Adler-32, SHA-256 etc with more to be added over time. The goal is simple-signature one-line calls, eg crc32(s: string): uint32.

Crunchy is also where the RSA code used by JWTea and other experimental cryptography code of mine lives, at least for now.

Getting RSA signing working introduced me to nim-lang/bigints. I was able to get RSA powmod working right away with that, which was great, however the performance was not ok for RSA out of the box (5+ seconds for one RSA 2048 signature with -d:release). I have a little fork of nim-lang/bigints for now and was able to make powmod 10x faster (0.5s) but there is still more work to do here. It should take less than 0.05s.

Depot

https://github.com/guzba/depot

Depot is a lightweight lib for working with S3-compatible storage provider APIs including Amazon S3, Google Cloud Storage, Cloudflare R2 and Backblaze B2. The fact all these providers have compatible APIs is a wonderful accident of history.

The S3-compatible API these providers give you is just an HTTP API underneath the common wrappers like boto3. Depot provides the URL signing which is all that is needed to do any operation.

In the future, a full Nim wrapper around common operations will be nice but that has not been done. I have an allergic reaction to auto-generated wrappers so it won't be that.

Why should you care about object stores like S3 etc? In brief: behind to your VM provider, these object stores are the next most useful provider when building a web service.

Thanks for giving this a read!

mratsim (orginal) [2023-01-25T04:42:30+01:00] view original

For modular exponentiation: https://forum.nim-lang.org/t/7276#46102, the technique is to convert the BigInt to "Montgomery form" (see: https://eprint.iacr.org/2017/1057).

In short that allows you to do modular arithmetic, including modular multiplications, without modulo. Modulo/division takes about 55 cycles while an addition takes 1 (and you can schedule 2 to 4 per cycle). So this would naively makes your code 55x faster.

The conversion requires the modulus to be odd which is the case for all primes of interest (RSA modulo 2 is not very interesting).

My whole code for modular exponentiations is here: https://github.com/mratsim/constantine/blob/2931913/constantine/math/arithmetic/limbs_montgomery.nim#L726-L835. The file has few dependencies (Limbs[N] = array[N, Word]) and some architecture/compile agnostic add-with-carry multiply-and-add.

For SHA256, you might be interested in a SIMD implementation for non-sha enabled CPUs: https://github.com/mratsim/constantine/blob/2931913/constantine/hashes/sha256/sha256_x86_ssse3.nim

guzba (orginal) [2023-01-25T04:59:11+01:00] view original

Thanks for the info about Montgomery form. I did see mention of it when doing some RSA reading but nothing more than that so far. I'll give this a look.

I see mention in your linked comment that the modulus is required to be constant / static by Constantine. Is that still true? I can find out for myself but there is no harm in asking. If it is required to be constant, is that critical or just how it is now?

I was also planning to look into https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Using_the_Chinese_remainder_algorithm for signing. When contrasting CRT with Montgomery, are they both viable ways to powmod faster or is CRT the wrong area to investigate?

I have been doing a bit of cryptography stuff lately (did RC6-CBC and AES-256-GCM recently) and it has become clear there is an enormous amount I do not know. Go figure. It also kind of doesn't play to my strengths either, this work has been a bit like rubbing my brain against sandpaper. I got through it though!

Araq (orginal) [2023-01-25T08:52:16+01:00] view original

This is awesome!

ElegantBeef (orginal) [2023-01-25T10:36:15+01:00] view original

Oh jeez I see usage of openArray[byte], never would I have thought I would see a crypto library that uses it! Though I do see in some places you've used string for no reason. You would be a lot cooler to use openArray[byte] there. Jokes aside I am quite happy to see code that can operate on data(or slices of it) without having to do a copy to send it.

mratsim (orginal) [2023-01-26T12:25:33+01:00] view original

I see mention in your linked comment that the modulus is required to be constant / static by Constantine. Is that still true? I can find out for myself but there is no harm in asking. If it is required to be constant, is that critical or just how it is now?

AFAIK in my link there https://github.com/mratsim/constantine/blob/2931913/constantine/math/arithmetic/limbs_montgomery.nim#L726-L835 the only thing static is spareBits. For some elliptic curves, for example Curve25519, you use 255-bit but you store it in 256-bit (4x64-bit words). Having spare bits in the physical storage enables carry-less optimizations similar to https://cryptojedi.org/peter/data/croatia-20150604.pdf / https://cryptojedi.org/peter/data/pairing-20131122.pdf

I have been doing a bit of cryptography stuff lately (did RC6-CBC and AES-256-GCM recently) and it has become clear there is an enormous amount I do not know. Go figure. It also kind of doesn't play to my strengths either, this work has been a bit like rubbing my brain against sandpaper. I got through it though!

For production-grade cryptography, it is very important that your code has no branches that depend on secret data, besides the size of it.

For example, this is absolutely forbidden: https://github.com/guzba/crunchy/blob/a282a89/src/crunchy/bigints.nim#L1353-L1354 That if depends on whether the secret key bits is 0 or 1 and can be retrieved by analysing

timing differences

electromagnetic waves:

power consumption variations

electromagnetic emissions:

Similarly, the whole bigint backend is unsuitable for implementing cryptography because it has "if branches" everywhere that may expose secret data: https://github.com/guzba/crunchy/blob/a282a89/src/crunchy/bigints.nim#L117-L138

Furthermore, exposing one bit of secret data is equivalent to exposing everything, see "padding oracle" attacks: https://joyofcryptography.com/pdf/book.pdf

And attacks get more and more powerful with machine learning: https://eprint.iacr.org/2019/358

One trace is all it takes: Machine Learning-based Side-channel Attack on EdDSA

Profiling attacks, especially those based on machine learning proved as very successful techniques in recent years when considering side-channel analysis of block ciphers implementations. At the same time, the results for implementations public-key cryptosystems are very sparse. In this paper, we consider several machine learning techniques in order to mount a power analysis attack on EdDSA using the curve Curve25519 as implemented in WolfSSL. The results show all considered techniques to be viable and powerful options. The results with convolutional neural networks (CNNs) are especially impressive as we are able to break the implementation with only a single measurement in the attack phase while requiring less than 500 measurements in the training phase. Interestingly, that same convolutional neural network was recently shown to perform extremely well for attacking the AES cipher. Our results show that some common grounds can be established when using deep learning for profiling attacks on distinct cryptographic algorithms and their corresponding implementations.

libgcrypt, GPG broken due to exposing bits during RSA modular exponentiation: https://lwn.net/Articles/727179/

WPA3 broken by timing routers response-time: https://wpa3.mathyvanhoef.com/

Intel SGX (Secure Enclave) broken by reading the thermometer during cryptographic operations: https://arstechnica.com/information-technology/2020/11/intel-sgx-defeated-yet-again-this-time-thanks-to-on-chip-power-meter/

Various techniques and countermeasures to attack cryptographic implementations, despite multiple layers of defences: https://eprint.iacr.org/2019/010.pdf

Besides, secure cryptography often requires assembly, because even trying to be branchless with bit manipulation, for example to do a select let a = if a < 0: x else: y the compiler might rewrite it into a if branch instead of conditional move: https://www.cl.cam.ac.uk/~rja14/Papers/whatyouc.pdf

Lastly, for example your AES implementation has cache-timing issues due to indexing arrays depending on secret data: https://github.com/guzba/crunchy/blob/a282a89d/src/crunchy/aes256.nim#L76 Unfortunately this is one big flaw of AES (see https://www.bearssl.org/constanttime.html#aes), it is very hard to implement it in constant-time without hardware support, though attackers need to run some program in a VM colocated with the CPU that runs the AES encrypt/decrypt.

Re BigInt design, for RSA, this is a portable constant-time bigint design: https://www.bearssl.org/bigint.html#big-integer-design Constantine uses a different one, because I use intrinsics or assembly for carries in particular to guarantee the absence of branches.

mratsim (orginal) [2023-01-26T13:15:17+01:00] view original

I forgot the most important, demo on how to experiment/bench my modular exponentiation:

https://forum.nim-lang.org/t/7276#46102

It can be made faster using Karatsuba. Multiplication complexity is O(n²) with n the number of words in the bigint, RSA2048 requires 32 words hence a complexity of "1024". Karatsuba is ~O(n^1.58) hence about 234. So we can naively estimate a 4x increase (actually probably 2x because you trade multiplications for additions)

Regarding the CRT / Chinese Reminder Theorem, yes it will accelerate signing because you split the number into half and compute those costly O(n²) multiplications on something that is twice smaller leading to a 4x speedup as well, but you do those 4x faster compute on each half so total speedup is 2x.

guzba (orginal) [2023-01-26T22:31:53+01:00] view original

Yeah I definitely don't encourage use of the RC6 or AES code in any real setting, I just wanted to learn the general mechanics being used and RC6 was a simpler place to start before moving to AES. I would use the x64 / arm64 intrinsics for "real" AES if I ever have that need.

My true goals for cryptography work are two-fold:

JWT without OpenSSL / anything else. Working, but can be better. Already on this, just a matter of time. Low implementation risk since the intent is just to produce JWT for consuming APIs like those provided by Google.

Learn more about cryptography in general with the dumb goal of someday implementing TLS 1.3 from scratch for Mummy. I think I only need one widely supported key exchange method and one symmetric enc method (ChaCha20-Poly1305 probably?). Enabling Nim to support TLS in HTTP servers and eventually clients without OpenSSL would be great. I recognize this is going head-first into the implementation risk you describe. This does not stop me, though, as I refuse to be trapped by fear in a world of dependency pain forever.

mratsim (orginal) [2023-01-27T00:34:01+01:00] view original

1. JWT: I'll probably add support for it, if only HMAC-SHA256 for https://github.com/ethereum/execution-apis/blob/main/src/engine/authentication.md and whatever is needed to automate Spotify ( https://developer.spotify.com/documentation/general/guides/authorization/use-access-token/ ), low priority though. It sounds like a chore. Also it's a problematic standard:

https://paragonie.com/blog/2017/03/jwt-json-web-tokens-is-bad-standard-that-everyone-should-avoid

fun post, timing attack over the network on bad JWT (server) implementations: https://hackernoon.com/can-timing-attack-be-a-practical-security-threat-on-jwt-signature-ba3c8340dea9

I'd like to implement TLS1.2, 1.3, DTLS and QUIC as well ... someday ... I just fear putting my hand in a black hole of handshakes and sessions, I'm not too worried about the cryptography, except the dozens of failure modes of RSA (it seems like an easy algorithm, but oops wrong padding and rekt https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Padding, and wrong isPrime function, say Miller-Rabin, and you can be exploited, https://eprint.iacr.org/2018/749). Yes for ChaCha20-Poly1305, easy to make constant-time, decent speed on all hardware despite no dedicated hardware instructions, can be accelerated by SIMD.

user2m (orginal) [2023-08-02T04:33:04+02:00] view original

Is there currently support for decoding the JWT and getting the claims back in string form? I'm having a tough time getting any of the other JWT libraries (nim-jwt, quickjwt) running on a linux machine and your is the only one! But I can't seem to fin facilities to validate the token or get back it's claim content

grd (orginal) [2023-08-04T08:39:14+02:00] view original

Good question. You need to realize that @guzba only committed 7 times and is currently apparently "away" from github. Maybe you can create an issue, since the only issue in there has been fixed recently. Was that you?

guzba (orginal) [2023-08-04T22:22:16+02:00] view original

What does "away" from GitHub mean or where are you seeing that?

grd (orginal) [2023-08-04T22:45:40+02:00] view original

I am sorry. I didn't see any work by you for a while. Please forget it.

Mirror of forum.nim-lang.org

9845 :: Show Nim: JWTea, Crunchy and Depot. New repos you may find useful.