pardon the click bait, but glad to see "concurrent" word counting https://github.com/LucaWolf/wordFreq-nim/tree/concurrent being slightly faster than its counterpart https://github.com/LucaWolf/wordFreq/tree/concurrent, for the imposed core count. Still, I would not go about abusing them threads like Go allows. Interestingly, slower than the single threaded version.
P.S. have used as an exercise to learn about using Malebolgia and threading channels, hence the opportunity for comparison -- not chasing speed here.
Interestingly, slower than the single threaded version.
Alright, but that makes it a toy then. ;-) Process files via memfiles and split up the work in same-sized chunks, ignore lines, a newline is just another way to write a space then.
That's not an interesting benchmark for threadpools.
I have a compilation there https://github.com/mratsim/weave/tree/master/benchmarks and I'm unfortunately missing the most interesting one and stressful one UTS (Unbalanced Tree Search) https://github.com/bsc-pm/bots/tree/master/omp-tasks/uts
Summary
| Name | Parallelism | Notable for stressing | Origin |
|---|---|---|---|
| Black & Scholes Option Pricing (Finance) | Data parallelism | PARSEC (Princeton Application Repository for Shared-Memory Computers) | |
| BPC (Bouncing producer-Consumer) | Task Parallelism | Load Balancing (Extreme) | Dinan et al / Tasking 2.0 (A. Prell Thesis) |
| DFS (Depth-First Search) | Task Parallelism | Scheduler Overhead | Staccato |
| Fibonacci | Task Parallelism | Scheduler Overhead (Extreme) | Cilk |
| Heat diffusion (Stencil / Jacobi-iteration - Cache-Oblivious) | Task Parallelism | Cilk | |
| Matrix Multiplication (Cache-Oblivious) | Task Parallelism | Cilk | |
| Matrix Multiplication (GEMM, BLAS) | Nested Data Parallelism | Compute, Memory, SIMD Vectorization, reference bench for super-computers | BLAS, Linpack |
| Matrix Transposition | Nested Data Parallelism | Nested loop | Laser |
| Nqueens | Task Parallelism | Speculative/Conditional parallelism | Cilk |
| SPC (Single Task Producer) | Task Parallelism | Load Balancing | Tasking 2.0 (A. Prell Thesis) |
| Histogram | Parallel Map-Reduce | Contention | Stack Overflow |
| LogSumExp (needed for Softmax cross-entropy in machine learning) | Parallel Map-Reduce | Huge matrices and expensive functions | Machine Learning |