nimforum mirror - disk-paged sequences

jlp765 (orginal) [2016-04-08T02:34:21+02:00] view original

When dealing with large data sets imported from disk storage into sequences, if you run out of memory, the program dies.

Just wondering, has anyone worked on an extension (package) to the standard sequence, that pages the sequence data to temporary disk files, so that as long as you have sufficient disk space, you don't run into memory limitations (or reduce the risk)?

(I couldn't see anything in nimble)

flyx (orginal) [2016-04-08T08:46:38+02:00] view original

Isn't that what the operating system should do automatically?

If you're dealing with that much data, you probably want to use a database or something.

andrea (orginal) [2016-04-08T12:21:03+02:00] view original

Not exactly a replacement for sequences, but I have written a package to read and write teafiles.

These are files that consist of a header (containing some information about the type) and then a sequence of structs, as described in the header.

When writing, one usually appends at the end.

When reading, data are memory mapped, so they swap to disk transparently due to OS support. When a file is opened for reading, if it is a var, one can also change a specific value.

An example looks like this:

# This is our data type
type Tick = object
  date: int64
  price: float64
  volume: int64

# First we write data to files...
let header = meta("tick data from NYSE") # you can add much more info
var file = create("ticks.tea", header)

# Stream data inside:
for tick in something:
  append[Tick](file, tick)

file.close()

# Read it back
var ticks = teafile[Tick]("ticks.tea")
for tick in ticks:
  echo tick

# We also have direct item access and length:
echo ticks[140]
echo len(ticks)

# Or we can modify it
ticks[139] = Tick(date: ..., price: ..., volume: ...)

ticks.close()

It is not a perfect substitute for sequences and surely deserves more documentation, but it could work depending on your use case. Also, be sure to check out the teafiles site: their documentation is pretty good and the specification is very simple

andrea (orginal) [2016-06-08T13:25:06+02:00] view original

I have a tentative example for disk-paged sequences here (just published the PR to Nimble, so it may take a little time before it appears there).

Mirror of forum.nim-lang.org

2188 :: disk-paged sequences