nimforum mirror - feasible lib(s) to do FFT on image with minimal dependencies?

oyster (orginal) [2023-05-14T06:55:53+02:00] view original

I use windows 10 64 bits.

In my case, I want to check whether an image is blurred or not. There are some steps according to https://pyimagesearch.com/2020/06/15/opencv-fast-fourier-transform-fft-for-blur-detection-in-images-and-video-streams/

read image into f(x,y). In fact there is no image file but data from clipboard.

do FFT on f(x,y) to get F(u, v), at the same time shift the 0-frequency to the center of F(u, v)

delete some low freq element by assigning 0 to it, then G(u, v) is calculated

do IFFT on G(u, v) to get g(x, y)

do log(abs(g(x,y)) to get h(x, y)

calculate the mean value of all pixels in h(x,y)

As far as I know,

ArrayMancer can read image files from disk, but I don't know how to pass clipboard data to it. Well this is not a big problem, since I can write the clipboard data into image file on disk if there is no other solution

ArrayMancer can finish step1, 3, 5~6 but without FFT and IFFT in step 2 and 4

FFTW use GPL which I don't like

OpenCV is too heavy for my case

so my question is whether there is lib(s) to perform the common image processing in frequency domain with minimal filesize and commercial-friendly license?

Thanks

mratsim (orginal) [2023-05-14T11:19:15+02:00] view original

See here: https://github.com/SciNim/impulse/blob/26e25e7/impulse/fft/pocketfft.nim#L302-L339

For now there is no high-level API because i couldn't get feedback on what people would want.

It requires C++ compilation as well. I started to implement a pure Nim one but it was quite an effort and I had no time at that time.

cblake (orginal) [2023-05-14T13:55:36+02:00] view original

@oyster - low-pass filtering is roughly how JPEG works in the first place. You can probably just use average per 8x8 block JPEG compression ratios to score image blurriness. Or variations on that theme. If your input is already delivered in .jpg, this computation could be almost instant - you just use space & color resolution metadata. Otherwise all you need is a fast to-JPEG converter, not scalable Fourier transforms (small ones will be part of a converter, though).

elcritch (orginal) [2023-05-14T18:04:13+02:00] view original

Oh FFT's are fun to play with. Is there a sci-nim repo I could add a PR too?

Here's a C version of the "Numerical Recipes" classic: https://github.com/saulwiggin/Numerical-Recipies-in-C/blob/master/Chapter12.Fast-Fourier-Transforms/four1.c

Good reference for the maths: https://faculty.washington.edu/seattle/brain-physics/FFT/numerical-recipes.pdf

mratsim (orginal) [2023-05-14T21:18:17+02:00] view original

You can play with FFT here https://github.com/SciNim/impulse/tree/26e25e701be75446ad2b91403a4538465f44f1b5/impulse/fft it has links to high-performance FFT documentation

There is the start of a Nim one in this commit: https://github.com/SciNim/impulse/tree/49b813232507470a047727712acda105b84c7815/impulse/fft

The algorithms I planned were the same as PocketFFT and FFTPack, explained in Brian Gough, FFT Algorithms, 1999 and also Clive Tamperton papers (links are dead so have to find them again).

Note that the API chosen was to ensure multithreading would work easily, i.e. allocate everything at the start when creating an FFT planner and pass ptr UncheckedArray around.

elcritch (orginal) [2023-05-15T02:43:16+02:00] view original

The algorithms I planned were the same as PocketFFT and FFTPack, explained in Brian Gough, FFT Algorithms, 1999 and also Clive Tamperton papers (links are dead so have to find them again).

Those are interesting implementations! I've never implemented any of them. Though, they do seem to add significant complexity. :/

I'd say getting a simpler FFT would be useful for many cases while being simpler to implement to get an initial version and further work out an API and unit tests. Sounds like the FFTW3 folks use Cooley–Tukey FFT along with prime factor ones:

The current version of FFTW incorporates many good ideas from the past thirty years of FFT literature. In one way or another, FFTW uses the Cooley-Tukey algorithm, the prime factor algorithm, Rader’s algorithm for prime sizes, and a split-radix algorithm (with a “conjugate-pair” variation pointed out to us by Dan Bernstein).

Seems like they switch off to different "plans" based on the various factors. Some of your PocketFFT code had bits of a plan right? That seems like a good idea.

shirleyquirk (orginal) [2023-05-15T07:39:21+02:00] view original

all you need is a fast to-JPEG converter

Yes, if the DCT is an acceptable alternative to the FFT, then check out pixie

cblake (orginal) [2023-05-15T10:45:27+02:00] view original

At least from the "checkmarks" in the README, pixie does not write JPEG. So, it may not do the raster -> to JPEG. ggplotnim was another thought, but it looks like it uses cairo as an image writing backend. I looked a little through Nimbleverse, but could not find a native Nim JPEG encoder. So, you may be in C library wrapper land to get even this.

DCT is really just a symmetric matrix product form, and if you are only doing "small scale" ones, like how JPEG tiles its operation, then all this deep dive into scalable / performant / multi-threaded DFT (while interesting) is off-point. You can just do always 8x8 or whatever small scale as a practical alternative that runs purely in L1 CPU cache.

I have worked on "blur scoring" before, and I would re-iterate my suggestion that "average tiled blurriness" is about as good as "global blurriness" as per the blog post inspiring @oyster. That post essentially creates a global score anyway. Further, if focus did vary a lot over some images, having the image tiled in "user coordinates" is also more friendly.

Coincidentally, before this topic arose, I happened to have just pushed something doing a single small tile as part of a "perceptive hasher": https://github.com/c-blake/ndup/blob/main/ndup/pHash.nim for near-duplicate video detection (though there are complement applications like "interestingly varying regions of long, boring security camera footage" and so on { yes, touching on how M PEG encoding works. ;-) }.

So, if there is a pure Nim requirement and gray scale is enough, Oyster could just "tile the pHash DCT with some padding". Otherwise he might have to muck with color planes. It seems he is already. But a full JPEG encoder would do all of the above, and might even multi-thread over the (fully independent!) tiles. Maybe this will inspire someone to write one in pure Nim. Also, notably, GPUs can help a lot for this kind of encoding.

As mentioned, but worth emphasizing and ending with -- if input arrives in JPEG already, as from say almost any digital camera, then someone else has already done the hard computation. All you need is a color planes/pixel resolution normalized compression ratio which you could compute a global value for in nanoseconds off of metadata alone. You can almost have a formula between quality / blurriness & compression ratio (or at least an empirical curve/lookup table). More work would be needed for "tiled" ratios, but then you are kind of building up the FFT from parts anyway (looking at tiles of tiles afterward and so on). It is not perfect, but then neither is the original blog post.

ingo (orginal) [2023-05-15T12:46:33+02:00] view original

Came across this bare bones Nim implementation of Stockham FFT .

oyster (orginal) [2023-05-15T16:22:07+02:00] view original

convert-to JPEG is really a smart idea since I am to lazy to wrap 1D-FFT on x-axis then y-axis to get a 2D-FFT.

btw, I tested 5 images and got a table. For the simple database, it seems that "mean of Jpeg", "rms of jpeg", "variance of jpeg", "standard deviation of jpeg" can be used to tell whether an image is blurred or not

filename	whether blur by human eye	blurmean of IFFT(FFT)(ref 1)	variance of Laplacian filter(ref 2)	mean of Jpeg	rms of jpeg	variance of jpeg	standard deviation of jpeg
1.png	Y	31.29	2804.34	110.00	109.26	39.79	6.31
2.png	N	30.20	2546.01	45.00	77.05	2554.14	50.54
3.png	N	37.79	3779.27	44.00	72.01	2688.46	51.85
4.png	Y	16.17	85.73	108.00	108.75	8.54	2.92
5.png	Y	13.52	262.06	109.00	108.96	17.94	4.23
6.png	Y	24.08	317.89	110.00	109.45	16.95	4.12

ref1: https://pyimagesearch.com/2020/06/15/opencv-fast-fourier-transform-fft-for-blur-detection-in-images-and-video-streams/

ref2: https://pyimagesearch.com/2015/09/07/blur-detection-with-opencv/

ps1. why not test on more images, for example https://github.com/Kwentar/blur_dataset? Because the database are too large

ps2. the table's syntax for this forum is a nightmare

JiyaHana (orginal) [2024-01-01T12:40:33+01:00] view original

Consider checking out libraries like pixie or exploring Nimbleverse for a Nim-native JPEG encoder, though these might need wrapper layers for C libraries. For smaller-scale operations like JPEG tiling, simpler symmetric matrix product forms for 2D Discrete Cosine Transform (DCT) can be effective and run purely in CPU L1 cache. Ex: this https://jpegcompressor.com/ tool using same algorithm to compress their images. Overall, this approach of employing symmetric matrix operations for smaller-scale DCT computations allows for efficient and faster image compression, particularly when working with JPEG-like tiling methods, and takes advantage of the CPU's high-speed cache for quicker processing.

Thank you.

Mirror of forum.nim-lang.org

10195 :: feasible lib(s) to do FFT on image with minimal dependencies?