Hi to all, I was trying to speed-up zip archives decompression by running more threads in parallel. I have a Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz 2.21 GHz laptop with 16 GB RAM and SSD, so not exactly a power horse, but it still have 4 Cores, 8 Threads. I've a folder containing 4x .zip files of approximately the same size / compression factor (each compressed file is around 120 MB, compressed size around 10 MB) and I used this toy code (old Nim 1.4.4 for some practical reasons related to avoid false malware detection):
#unzip_parallel.nim - compiled with nim c -d:release --mm:orc --threads:on unzip_parallel.nim (tried also -mm:arc and deafult, I'm on Nim 1.4.4)
import zippy/zipArchives, std/[os, sugar, times, threadpool, strformat]
let archives = collect(newSeq):
for archive in walkFiles("*.zip"):
archive
proc xtract(filename: string) =
let tempfolder = "temp_" & filename[0..^5]
extractAll(filename, tempfolder)
let startTime = cpuTime()
#I know I've just 4 zip archives in this directory
spawn xtract(archives[0])
spawn xtract(archives[1])
spawn xtract(archives[2])
spawn xtract(archives[3])
sync()
let endTime = cpuTime()
let elapsed = endTime - startTime
echo(fmt"unzip of all files completed in: {elapsed} secs")
The serial version is this one:
#unzip_serial.nim
import zippy/zipArchives, std/[os, sugar, times, strformat]
let archives = collect(newSeq):
for archive in walkFiles("*.zip"):
archive
proc xtract(filename: string) =
let tempfolder = "temp_" & filename[0..^5]
extractAll(filename, tempfolder)
let startTime = cpuTime()
xtract(archives[0])
xtract(archives[1])
xtract(archives[2])
xtract(archives[3])
let endTime = cpuTime()
let elapsed = endTime - startTime
echo(fmt"unzip of all files completed in: {elapsed} secs")
While the serial version takes approx. 24 secs, so 6 secs per file, the parallel version takes around 127 secs, so it's much much worse. Am I using the wrong module/approach and or is Windows task scheduler is doing bad? My laptop is unfortunately always running in background some anti-malware + antivirus software, I do not know if it can apply some constraints/limitations to not-signed binaries created by unknown companies.
I've also tried to perform another test. I've crafted two trivial .bat files (start /B is a bit like & on Linux... it allows to execute following command without waiting for the actual to complete):
#tar_unzip.bat
start /B tar -xf file1.zip -m
start /B tar -xf file2.zip -m
start /B tar -xf file3.zip -m
start /B tar -xf file4.zip -m
#nim_unzip.bat
start /B unzip_single file1.zip
start /B unzip_single file2.zip
start /B unzip_single file3.zip
start /B unzip_single file4.zip
unzip_single is basically the compiled nim binary applying zippy's extractAll to paramStr(1).
While also this way Nim / Zippy binary based .bat takes around 127 secs to complete its work the bat based on Windows built-in tar utility takes around 18 secs (the single file tar uncompress is around 6 secs, like zippy), there is a 4/3 factor speed-up (I hoped more... but there is a small improvement).
Any clue about why? Any of you did succesfully improve zippy's uncompress time acting on different files in parallel? How much? Thank you in advance.
I've moved to my old "faster" Windows PC. Despite being an older model: Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz 2.70 GHz with just 8 GB RAM (older and I assume slower SSD, maybe not), it's not plagued by the same CPU-hungry Anti-Virus Anti-Malware software running in background. The situation is totally different, and the parallel version is slightly faster than serial one (10 secs vs 15 secs), but this is happening on both Nim 2.0.0 and Nim 1.4.4. Performance wise they are rather in par... however, to be fair, 1.4.4 parallel version crash sometimes, so 2.0.0 in this respect is a more solid choice:
PS C:\Users\Andrea\Documents\Nim\testing_code\nim_parallel> ./unzip_parallel
unzip of all files completed in: 10.024 secs
PS C:\Users\Andrea\Documents\Nim\testing_code\nim_parallel> ./unzip_parallel
unzip of all files completed in: 11.237 secs
PS C:\Users\Andrea\Documents\Nim\testing_code\nim_parallel> ./unzip_parallel
unzip of all files completed in: 10.553 secs
PS C:\Users\Andrea\Documents\Nim\testing_code\nim_parallel> ./unzip_parallel_old
oserr.nim(94) raiseOSError
Error: unhandled exception: The handle is invalid.
[OSError]
PS C:\Users\Andrea\Documents\Nim\testing_code\nim_parallel> ./unzip_parallel_old
unzip of all files completed in: 10.858 secs
PS C:\Users\Andrea\Documents\Nim\testing_code\nim_parallel> ./unzip_parallel_old
unzip of all files completed in: 10.202 secs
PS C:\Users\Andrea\Documents\Nim\testing_code\nim_parallel> ./unzip_parallel_old
oserr.nim(94) raiseOSError
Error: unhandled exception: The handle is invalid.
[OSError]
PS C:\Users\Andrea\Documents\Nim\testing_code\nim_parallel> ./unzip_parallel_old
unzip of all files completed in: 10.877 secs
Bottom line is: my work PC sucks, and installed software, particularly antivirus / anti-malware (the latter is eating 293 MB and being often the top process for CPU usage) could really hurt performance in an unpredictable manner (on my work PC serial version can use at most 14.9% of CPU, and parallel drops much lower, down to 2-3%, on my old home PC, parallel version use more than 95% CPU) . Thank you and sorry for bothering you.