I'm not an expert on the internals of Nim; and while I had quite a bit of experience with C back in the '80s. I'm not going to claim to be a C internals expert either. In some of my current "pooling" libraries I've been making various assumptions but without much context, so I'm not sure I've made the right choices.
So, I'm asking various questions of the community; and asking those questions generically so that others can also gain value from the feedback.
Just to ensure clarity, I'm going to precisely define what I mean by pooling in this context.
Essentially, a "pooled" resource is one that collects "stuff" at some global context and then allows later asynchronous threads to "borrow stuff" from that pool. It must allow this to happen without conflict (or with predictably handled conflict.)
Ideally, a pooled library also allows for limited management of that "stuff". That management can be either be run as another thread/process or on-demand due to behavior of the threads.
To give an example, I'll use my particular library for MongoDB access from a web server. Since it can take up to two seconds to connect (with auth) to a MongoDB server, you do not want to create a new connection for each new webpage. Pooling is needed. Example of use:
import jester, mongopool, bson
# mongopool at https://github.com/JohnAD/mongopool
# create a "universal pool" of pre-established connections; at least 3 but a max of 20.
connectMongoPool("mongodb://someone:[email protected]:27017/abc", minConnections=3)
# a jester web server, compiled with threads on:
routes:
get "/":
# a thread as started as the result of an incoming http connection
# grab one connection from the pool:
var db = getNextConnection()
# do something with it.
var doc = db.find("temp").returnOne()
# release the connection back to the pool. (Yes, there could be fancier ways to handle this with templates etc.)
releaseConnection(db)
# sending the result to the web browser; ending the web thread.
resp "doc = " & $doc
I've got about 6 web sites that run this way. Most of them are stable 😃. An example is at https://tech.radio . That one has been running with four-ish database connections for about 3 months. Literally, the four TCP sessions have not dropped that entire time and are shared among the incoming connections.
So, the general advice questions:
If an object must be used, should I avoid "ref objects"?
Objects by reference are more work for the garbage collector, but I would think passing an object to a thread by reference would be simpler and less intense.
Opinions?
# code to discuss
import net
type
Connection = ref object # or just an object?
sock: Socket
status: string
Pool = ref object # or just an object?
connections: array[100, Connection]
Are sequences and tables inherently less safe?
If so, how much so?
# code to discuss
type
Pool = ref object
min: int
max: int
connections: seq[Connection] # or stick to arrays?
My particular driver is using the deques library; but perhaps there is a better approach? Again, please don't concentrate on my solution. What about in general?
Is it better to have sockets be synchronous or asynchronous
The threads using the pooled resource are already asynchronous in nature. So using async sockets means threads would be launching more threads (or am I wrong). Not a huge problem, but it uses more resources for what appears to be little gain; if not an actual drop in performance.
pros/cons?
Is keeping the all the pooled resources private to the module itself a good idea? Pros/Cons?
For example, with my library, one simply calls a generic connectMongoPool(uri) rather than returning a passable global object like var serverPool = connectMongoPool(uri). If it returned an object, one could have multiple connections to different database clusters. One could then get a connection with UCSF like serverPool.getNextConnection().
I basically made the connections private out of paranoia. But that might be over blown.
Are there any dangers to making the "pooling object" accessible? (Assuming the properties themselves are private.)
By necessity and also for performance and GC portability I don't have any ref objects but if I have the option:
you can't solve the first 3 with garbage collected pool objects.
For pools that live as long as the application or a context, I prefer to manually allocate and deallocate them at context initialization and not burden the GC (and avoids pools moving in memory with a copying GC).
Weave pools: