HI,
I have ram around 8Gb and I want to read a file which is 16gb what is best way. I know I cant keep whole data in seq[] is there any alternate? Or it is better to read the data from file directly rather than reading it in seq and process. Also, is there a way such that I can keep 10% data in memory process it, while I am processing say this 10% data, another process reads another 10% and then I can read that data.
To tell you the difference. Memfiles does not actually put the entire file in memory, but makes you belive it is. It uses the memory mapping unit that is normally used for ram swapping, but in this context it reads the context of the file as soon as the file is accessed. This needs the size of the file in address space. In your case it would mean it is not possible to use memfiles on a 32 bit system, but on 64bit no problem at all.
FileStreams continuously read the file, therefore you do not have random access to the file.
@andrea has a spills module that pages a seq to/from disk (IIRC). This may help or give you some ideas.
I don't agree with @Krux02 because the memfiles reads in multiple (one or more) block of the file, where each block is the``PAGE SIZE`` of your OS (usually 4K or 8K). If you are on a 32bit OS, you only have a 2g memory space to play with (in practice it is less than that due to whatever else is running in your programs mem space). The issue is when you ask for too many blocks at one time and exceed your free address space.
Can you process your data as a single running calculation (rather than having to calculate by parsing all the data multiple times)?
You just need 8192 bytes (64 bit CPU) for you large files. Whenever you actually does not need all 16GB data. Try:
import os, posix
const
BufSize = 8192
filename = "/home/king/test.txt"
var
pos = 0
size = 0
f: File
buf: array[BufSize, char]
proc flushBuf() =
if size > 0:
echo "buf: ", buf, " len: ", len(buf)
pos = 0
size = 0
if not open(f, filename, fmRead):
raise newException(IOError, "could not open")
while true:
size = f.readBuffer(buf[pos].addr, BufSize - pos)
if size < 0:
let lastError = osLastError()
if lastError.int32 notin {EINTR}:
raise newException(IOError, osErrorMsg(lastError))
else:
continue
if f.endOfFile():
flushBuf()
break
if size + pos == BufSize:
flushBuf()
else:
inc(pos, size)