logo       

Re: memcache API happiness for C and Ruby users...: msg#00004

web.cache.memcached

Subject: Re: memcache API happiness for C and Ruby users...

If anyone has any neat tricks for doing this in parallel, drop me a
line offline. I refuse to do this in sequence/serially so any ideas
are going to require either async IO, or non-blocking IO. Ideally I'd
use use pthreads, but that's less than portable (despite the p in
pthreads). Who knows, maybe I'll start abstracting IO handling and can
sneak in the use of kqueue(2) or libevent(3). Efficient reassembly

libevent only recently became safe to use in multi-threaded
applications. Even though libmemcache isn't multi-threaded, it's
currently thread safe (at least when used appropriately) and thats how I
use it so I'd like to see it stay that way. I've had serious problems
using libevent even for trivial tasks in a multi-threaded app (even
using it in only one of the threads). I'll admit I haven't tried the
latest release which fixes some issues. (it's only 2 days old!)

Heh. Given the quick adoption of memcache(3) in commercial applications, I'd be reluctant to bring in libevent(3) given it uses a four clause BSD license and I'm currently using the more permissive MIT license (haven't found a way to stick it to GPL users, but I'm still thinking). To be honest, I like the way that thttpd(8) abstracts its IO handling and may head down a similar road.

that's not O(N^2) is my main concern, however. Anything that's more
expensive than O(2N) isn't gunna cut it. Anyway, like I said, I'll
probably get to the scatter/gather portion of memcache(3) soonish. In
the meantime, enjoy the ruby bindings. I'll do some benchmarking later
to justify to myself the need to add Memcache::Request and
Memcache::Response class (hint: need to prevent excessive object
creation in tight loops).

I've spent a fair amount of time thinking about this, and threads sure would make this easy, but I can only imagine the grumpy flames I'd get for that. I think this is the course of action I'm going to take:

*) move the internal buffer from struct memcache to struct memcache_server. This will use more memory, but I can't think of an alternative that would let me safely handle multi-get's that span servers.

*) create a set of IO routines that wrap select(2), poll(2), and kqueue(2) ala thttpd(8).

*) Add a void *misc struct member to struct memcache_re(q|s).

*) In mcm_get(), iterate through each key and find its server and group the keys into a new request that is dedicated for a specific server. Essentially this splices out and creates a struct memcache_req for each server.

*) Hand the request objects to a routine that iterates through the requests using non-blocking IO. Each request object has a state added to its flags to signify its state (read/write). At the end of each iteration through the list of requests being processed, it hasn't finished its IO (reading or writing), register the file descriptor with the IO routines above and wait to read/write more data. Repeat/rinse.

*) Once all IO has been completed, begin reassembling data in the correct order using the misc pointer for the internally used memcache_res that were dispatched to each of the servers.

The advantages to this are:

*) Under ideal circumstances, it iterates through each request only twice. Once when creating the appropriate struct memcache_re(q|s)'s, and once when assigning the values back to the appropriate memcache_res object that gets returned to the user via struct memcache_req.

*) If there is only one server, blocking IO can be used.

*) Reassembly is O(N).

*) Waiting on descriptors is abstracted and lets folks who have kqueue(2) available to them use kqueue(2) and removes the use of select(2) unless you need it.


And the disadvantages are:

*) Buffer size goes from only one per struct memcache object to one for every struct memcache_server object.


All things considered, I can't think of a better way to go that satisfies everyone's needs. Unless someone can sport a good reason, I'll make the IO selection routines configure time dependent (ie, pmk -e use_(kqueue|poll|select)). -sc


PS I probably shouldn't bury this here at the bottom of a technical email, but I figure it's the gearheads who'd read this that would be most interested. I'm looking to pick up roughly 20-40hrs of contract work in the month of January in the event that anyone has any memcache (C, PHP, Ruby, Perl, Java, etc. or general talks/lectures) or PostgreSQL needs. Email me offline if you're interested.

--
Sean Chittenden




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise