I use windows 10 64 bits.
In my case, I want to check whether an image is blurred or not. There are some steps according to https://pyimagesearch.com/2020/06/15/opencv-fast-fourier-transform-fft-for-blur-detection-in-images-and-video-streams/
As far as I know,
so my question is whether there is lib(s) to perform the common image processing in frequency domain with minimal filesize and commercial-friendly license?
Thanks
See here: https://github.com/SciNim/impulse/blob/26e25e7/impulse/fft/pocketfft.nim#L302-L339
For now there is no high-level API because i couldn't get feedback on what people would want.
It requires C++ compilation as well. I started to implement a pure Nim one but it was quite an effort and I had no time at that time.
Oh FFT's are fun to play with. Is there a sci-nim repo I could add a PR too?
Here's a C version of the "Numerical Recipes" classic: https://github.com/saulwiggin/Numerical-Recipies-in-C/blob/master/Chapter12.Fast-Fourier-Transforms/four1.c
Good reference for the maths: https://faculty.washington.edu/seattle/brain-physics/FFT/numerical-recipes.pdf
You can play with FFT here https://github.com/SciNim/impulse/tree/26e25e701be75446ad2b91403a4538465f44f1b5/impulse/fft it has links to high-performance FFT documentation
There is the start of a Nim one in this commit: https://github.com/SciNim/impulse/tree/49b813232507470a047727712acda105b84c7815/impulse/fft
The algorithms I planned were the same as PocketFFT and FFTPack, explained in Brian Gough, FFT Algorithms, 1999 and also Clive Tamperton papers (links are dead so have to find them again).
Note that the API chosen was to ensure multithreading would work easily, i.e. allocate everything at the start when creating an FFT planner and pass ptr UncheckedArray around.
The algorithms I planned were the same as PocketFFT and FFTPack, explained in Brian Gough, FFT Algorithms, 1999 and also Clive Tamperton papers (links are dead so have to find them again).
Those are interesting implementations! I've never implemented any of them. Though, they do seem to add significant complexity. :/
I'd say getting a simpler FFT would be useful for many cases while being simpler to implement to get an initial version and further work out an API and unit tests. Sounds like the FFTW3 folks use Cooley–Tukey FFT along with prime factor ones:
The current version of FFTW incorporates many good ideas from the past thirty years of FFT literature. In one way or another, FFTW uses the Cooley-Tukey algorithm, the prime factor algorithm, Rader’s algorithm for prime sizes, and a split-radix algorithm (with a “conjugate-pair” variation pointed out to us by Dan Bernstein).
Seems like they switch off to different "plans" based on the various factors. Some of your PocketFFT code had bits of a plan right? That seems like a good idea.
all you need is a fast to-JPEG converter
Yes, if the DCT is an acceptable alternative to the FFT, then check out pixie
At least from the "checkmarks" in the README, pixie does not write JPEG. So, it may not do the raster -> to JPEG. ggplotnim was another thought, but it looks like it uses cairo as an image writing backend. I looked a little through Nimbleverse, but could not find a native Nim JPEG encoder. So, you may be in C library wrapper land to get even this.
DCT is really just a symmetric matrix product form, and if you are only doing "small scale" ones, like how JPEG tiles its operation, then all this deep dive into scalable / performant / multi-threaded DFT (while interesting) is off-point. You can just do always 8x8 or whatever small scale as a practical alternative that runs purely in L1 CPU cache.
I have worked on "blur scoring" before, and I would re-iterate my suggestion that "average tiled blurriness" is about as good as "global blurriness" as per the blog post inspiring @oyster. That post essentially creates a global score anyway. Further, if focus did vary a lot over some images, having the image tiled in "user coordinates" is also more friendly.
Coincidentally, before this topic arose, I happened to have just pushed something doing a single small tile as part of a "perceptive hasher": https://github.com/c-blake/ndup/blob/main/ndup/pHash.nim for near-duplicate video detection (though there are complement applications like "interestingly varying regions of long, boring security camera footage" and so on { yes, touching on how M PEG encoding works. ;-) }.
So, if there is a pure Nim requirement and gray scale is enough, Oyster could just "tile the pHash DCT with some padding". Otherwise he might have to muck with color planes. It seems he is already. But a full JPEG encoder would do all of the above, and might even multi-thread over the (fully independent!) tiles. Maybe this will inspire someone to write one in pure Nim. Also, notably, GPUs can help a lot for this kind of encoding.
As mentioned, but worth emphasizing and ending with -- if input arrives in JPEG already, as from say almost any digital camera, then someone else has already done the hard computation. All you need is a color planes/pixel resolution normalized compression ratio which you could compute a global value for in nanoseconds off of metadata alone. You can almost have a formula between quality / blurriness & compression ratio (or at least an empirical curve/lookup table). More work would be needed for "tiled" ratios, but then you are kind of building up the FFT from parts anyway (looking at tiles of tiles afterward and so on). It is not perfect, but then neither is the original blog post.
convert-to JPEG is really a smart idea since I am to lazy to wrap 1D-FFT on x-axis then y-axis to get a 2D-FFT.
btw, I tested 5 images and got a table. For the simple database, it seems that "mean of Jpeg", "rms of jpeg", "variance of jpeg", "standard deviation of jpeg" can be used to tell whether an image is blurred or not
filename | whether blur by human eye | blurmean of IFFT(FFT)(ref 1) | variance of Laplacian filter(ref 2) | mean of Jpeg | rms of jpeg | variance of jpeg | standard deviation of jpeg |
---|---|---|---|---|---|---|---|
1.png | Y | 31.29 | 2804.34 | 110.00 | 109.26 | 39.79 | 6.31 |
2.png | N | 30.20 | 2546.01 | 45.00 | 77.05 | 2554.14 | 50.54 |
3.png | N | 37.79 | 3779.27 | 44.00 | 72.01 | 2688.46 | 51.85 |
4.png | Y | 16.17 | 85.73 | 108.00 | 108.75 | 8.54 | 2.92 |
5.png | Y | 13.52 | 262.06 | 109.00 | 108.96 | 17.94 | 4.23 |
6.png | Y | 24.08 | 317.89 | 110.00 | 109.45 | 16.95 | 4.12 |
ref2: https://pyimagesearch.com/2015/09/07/blur-detection-with-opencv/
ps1. why not test on more images, for example https://github.com/Kwentar/blur_dataset? Because the database are too large
ps2. the table's syntax for this forum is a nightmare
Consider checking out libraries like pixie or exploring Nimbleverse for a Nim-native JPEG encoder, though these might need wrapper layers for C libraries. For smaller-scale operations like JPEG tiling, simpler symmetric matrix product forms for 2D Discrete Cosine Transform (DCT) can be effective and run purely in CPU L1 cache. Ex: this https://jpegcompressor.com/ tool using same algorithm to compress their images. Overall, this approach of employing symmetric matrix operations for smaller-scale DCT computations allows for efficient and faster image compression, particularly when working with JPEG-like tiling methods, and takes advantage of the CPU's high-speed cache for quicker processing.
Thank you.