nimforum mirror - Slow image processing with pixie

MisterDrgn (orginal) [2024-02-29T02:28:26+01:00] view original

Hi all, new user here, so apologies if I'm missing something obvious, or if I'm posting this question in the wrong place. I've been playing with an old (no updates in the last two years) project on github: https://github.com/jesvedberg/tpix

This is a nim program that takes an an image from a file and outputs it to stdin in a particular format with special escape characters, which results in the kitty terminal emulator drawing the image on screen. It uses the pixie library (which I've heard good things about) for image processing. What has surprised me is how slow it is, compared to viu, a program coded in Rust that does the same thing. I'm wondering if image processing really is this much slower in Nim, or if (as wouldn't at all surprise me) I'm doing something wrong. Note that I am compiling with the -d:release flag on the latest version of Nim (2.0.2).

In greater detail, the program does the following:

Decode input image to produce a pixie Image object (for testing, the input image is a 1200x810 jpeg)

Resize image to fit in terminal window (for testing, I'm resizing it to have a width of 1000)

Encode pixie Image as a png

Break png data into chunks, and send it to stdout, where kitty interprets it

This process takes ~180ms, whereas the viu program (which may not follow the same steps, I haven't looked at its source) does this in 30ms.

I tried using the profiler and discovered that a large majority of the time was in step 3, encoding the image as a png. This isn't necessary, as you can send raw pixel data to kitty instead. So I replaced this step with a simple function that turns the image's pixel data into a string. With this change, the program instead takes ~85ms.

So we got a major speed boost, but we're still taking far longer than viu. Rerunning the profiler suggests that most of the remaining time is in steps 1 and 2: decoding and resizing the input image. I don't think there's much I can do about those steps.

Any pointers to what I might be doing wrong or who I could ask would be greatly appreciated. Thanks!

demotomohiro (orginal) [2024-02-29T04:47:27+01:00] view original

Two libraries you are comparing speed use same algorithms? There are many algorithm to resizing an image. When resizing an image to smaller size, fast algorithm samples one pixel from source image and higher quality but slower one samples multiple pixels.

MisterDrgn (orginal) [2024-02-29T05:55:32+01:00] view original

Resizing can't be the only issue because if I tell it not to resize at all, it still takes 80-90ms, whereas the other program consistently takes 25-30ms regardless of size.

...actually, it's pretty interesting that the nim program takes 85ms with no resize, but it takes 30ms if I resize it to very small. That would suggest that the majority of the time _isn't being spent on resizing or encoding the original image. That time deficit has to be going towards either generating the output or printing it to stdin, even though the profiler says most of the time isn't spent doing that. So now I'm doubting the profiler.

treeform (orginal) [2024-02-29T06:05:41+01:00] view original

We tried to make pixie really fast, I doubt your problem is with pixie speed.

After looking at viu program in rust, sorry I don't know rust well, but it it looks like it uses the viuer create to actually print the image. The way it appears to do it is to write the image to a tmp file and have the kitty terminal read it? So entire viu program eventually just prints


echo "\x1b_Gf=32,s=100,v=100,c=10,r=10,a=T,t=t;path/to/file.png\x1b\\"

See: https://docs.rs/viuer/latest/src/viuer/printer/kitty.rs.html#135-145

A single line program that just write a short string to stdout will be faster then the one that decodes/resizes/encodes and then sends its block by block through the terminal codes.

MisterDrgn (orginal) [2024-02-29T06:49:42+01:00] view original

Thanks for pointing this out. So viu doesn't actually resize images--it simply give kitty the desired dimensions and lets kitty handle that. However, under this approach it is still necessary to

Load in an image

Extract the raw rgba values

Print these values to a file [NOTE: This approach seems tricky when your script is running in a container or on a remote server, which is exactly where I'd actually want to have this script available...]

The code I have, in its current format, shares these first two steps with viu, but I suspect it doesn't do them as efficiently as it could. I know you're a primary pixie developer, so would you mind if I picked your brain on this? This is what the code is doing now:

1) Get a pixie Image from a file or from stdin filename.readImage or stdin.readline.readImage

2) Extract raw rgba values from the pixie Image let rawData = encode(imgData(img)) ## Where imageData is a function that I wrote. It's very possible there's already a better way to do this?


proc imgData(img: Image): string =
  for d in img.data:
    result.add(char d.r)
    result.add(char d.g)
    result.add(char d.b)
    result.add(char d.a)

Thanks.

guzba (orginal) [2024-02-29T07:30:59+01:00] view original

That loop


proc imgData(img: Image): string =
  for d in img.data:
    result.add(char d.r)
    result.add(char d.g)
    result.add(char d.b)
    result.add(char d.a)

is a crazy slow way of doing this.

If you just need raw RGBA values, just use image.data. If you need a pointer to it, use image.data[0].addr, and use copyMem if you need it copied somewhere.

The implementation details make an enormous difference here.

thecryptogeek (orginal) [2024-02-29T20:21:53+01:00] view original

As pointed out by guzba, using a a sequence the way you are using it here will be slow. It's a very common way to go about things if you are coming from a language like Python.

A very naive way of implementing a dynamically sized array would do something like this under the hood:

When you initialize an array the CPU allocates memory for that new array.

When you add an item to the array the CPU then

Allocates new memory for the new array with a length of 1.

Then it copies over the items from the old array to the new array and adds the new item to the end of that array.

Then it frees up the memory that was allocated for the old array.

So every time a new item is added there is:

new memory allocation

copying of all the values from the old arrays' memory addresses to the new arrays' memory location.

deallocation of the now old array.

As you can see that is a lot of work being done every time a new item is added.

To avoid this you instead initialize the original array with a length of the number of items you would need it to hold.

var pixelArray: array[imageWidth * imageHeight * 4, char]

Here I'm just multiplying by 4 for the RGBA values.

This way there would only be 1 memory allocation and zero deallocations

If you want to use a sequence you can initialize it with type and length with:

var mySeq = newSeq[char](imageWidth * imageHeight * 4)

I'm actually not sure if there is any performance penalty for using a seq instead of an array in Nim as long as you initialize with a length and don't delete from it.

But as guzba pointed out, you already have the pixels in the image array so this was only to add a little more explanation.

(Disclaimer: my explanation of the implementation of a dynamic length array here is just theoretical and only for a base understanding)

P.S. I told you treeform and guzba would get on it if you posted it here instead of Reddit :-)

guzba (orginal) [2024-02-29T20:37:38+01:00] view original

This is a much better explanation of what I was trying to say.

One other tip is if you need to write raw pixels to a file, you can use the pointer + len version of writeFile along with the image.data[0].addr pointer + image.data.len * 4 byte len.

MisterDrgn (orginal) [2024-03-01T04:25:51+01:00] view original

Thanks for the suggestions. I will play around with this over the weekend.

I had considered initializing the data structure at the desired final size, but then didn't do it. I guess what happened was that the nim profiler told me this part of the code was taking up a pretty small percentage of the total time, so I just didn't worry about it. Later, when I had reason to distrust the profiler, I tried removing one step at a time, and it turned out this step is taking nearly 25% of the total time on my test case (~20ms out of ~85ms) (some of that might also be in the call to encode, I'd have to check). So some improvement here would be nice.

firq (orginal) [2024-03-02T17:03:14+01:00] view original

Hi. I wrote tpix. I haven't updated tpix in a long time for two reasons: the first is that I bought a Macbook Air and I have mostly been using iTerm2 instead of Kitty since then, and the second is that tpix was more or less feature complete as far as I was concerned (though I have been considering adding support for the iTerm2 image protocol as well).

I didn't really spend much time optimizing tpix, but as you've shown there is definitely room to do so, especially when running it on a local computer. The big reason why I never felt that it was necessary was because my main use case was running it on remote systems and in those cases transferring the data is generally much slower than any other step, so it made sense to me to keep things simple and just convert the image as a compressed png file, split it into chunks and send it to Kitty.

In your list of steps that tpix performs, you forgot to mention converting the data to base64. I suspect that that is faster than converting the image to a png, though I don't really know.

MisterDrgn (orginal) [2024-03-03T15:31:32+01:00] view original

Oh, hey, how's it going? Playing around with tpix has been an interesting first nim project. Some of the things I've learned:

Conversion to png is slow. It's better to transmit the raw pixel data. Even better, get rid of the alpha value first, so there's less to transmit. Oh, or are you saying that the time to write to stdout so kitty can read it in is considerably slower on a remote? That actually is one of my main use cases, though so far I've only tested inside a docker container.

Resizing is also a bit slow. Unless the image is being resized to considerably smaller than its starting size, it's probably faster to ask kitty to resize it (kitty's added several control code options since tpix was last updated, I believe).

The nim profiler seems quite nice, but it isn't always reliable--it misled me a bit regarding which operations were the slowest.

The conversion to base64 does take some time, but yeah, it's not the slowest operation.

JiyaHana (orginal) [2024-10-19T11:55:17+02:00] view original

You can improve performance by using copyMem for memory copying and initializing the sequence size upfront to avoid resizing overhead. Also, consider compressing images with https://jpegcompressor.com for size reduction without losing quality.

Mirror of forum.nim-lang.org

11120 :: Slow image processing with pixie