Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...
|
Re: A lurker's RT -- proactive cache: msg#00866
text.xml.cocoon.devel
|
Subject: |
Re: A lurker's RT -- proactive cache |
Miles Elam wrote:
First, some background: The server's a RedHat box running Cocoon w/
Tomcat and TUX in front. TUX is a kernel-level, simple web server
(basically static content) that shoots pages out of the system page
cache at alarming speeds, with appropriate hardware copies data
directly to network interface buffers rather than to main memory
first, and if it can't serve the request, passes control of the socket
to userland -- in my case, Tomcat/Cocoon on port 8080. This also has
the advantage of allowing Tomcat and the JVM to run as "nobody" since
TUX can listen on 80 and pass on to non-privaledged ports above 1024.
But I digress...
The initial purpose of this setup was to allow Tomcat/Cocoon the
handling of dynamic data and TUX to handle static content -- their
respective strong suits. To put it in perspective, TUX can not only
saturate a 100baseT link on meager Pentiums but has extraordinally low
CPU and memory usage. The less CPU/mem being used for static content,
the more Tomcat/Cocoon has for the dynamic stuff. This is the crux of
my current mindset. As I was working, I found myself wishing that
they could have their caches combined -- or rather that TUX could have
access to Tomcat/Cocoon's dynamic data cache. This was dismissed as a
pipe dream very quickly; native (kernel!) code tinkering around in a
userland JVM instance shouldn't be considered a well-considered design.
But then I thought, "Why not have Cocoon's cache output files to a
directory path that TUX can later serve?" However, this ran into the
problem of cache expiration. If Cocoon's cache (from now on I'll
refer to as a "reactive cache") were to output as a standard,
serialized file for TUX to serve, TUX would indeed serve it, but it
would no longer allow any requests to come into Tomcat/Cocoon. Thus
the reactive cache's entried would never be accessed again and would
never expire its cache. Thus stale data would forever more be served.
Assuming that you solve expiration issue (as written below)... Still
this Cocoon usage scenario is not applicable for 90+ % of web sites,
were personalization, authorization, and authentication, or just some
application logic is required. When request is processed, Cocoon
components such as Actions, Matchers, Selectors are executed. And if
last two (usually) do not affect system's state, first one (ususally)
does. Which means it is not enough to just serve contnet. It is must to
execute action as well.
For your scenario, may be cron job with more efficient command line
version of Cocoon will be enough. Or, look at the simplier solutions,
like murka ( http://murka.sourceforge.net/) - it is adequate for
stateless content generation (XSLT on static XML files).
Vadim
So would there be a way to allow Cocoon to expire cache entries
immediately when prerequisite files were altered, database tables
updated, etc.? Not from within Java itself, no. Not without
excessive polling of the filesystem and other data stores. This would
so adversely affect the performance of the system so as to make any
gains moot. So what about opening up that Pandora's box which
contains such items as "JNI code in Cocoon."
...I'll wait for the geering, insults, and epithets to die down...
Taken with the SGI library fam (File Alteration Monitor) for example,
it would be possible to expire a cache entry as soon as a prerequisite
was changed. In the case of fam, the serialized file in the document
tree would be deleted causing TUX to route the subsequent requests to
userland to be generated again. Obviously this would be horrible for
absolutely dynamic data (for which any cache is merely a drain on the
system), but would be a performance giant (in theory of course) when
demonstrated against minimally dynamic content. It would have all of
the advantages of Cocoon CLI pre-generation while still allowing
dynamic updates.
I would also think it would improve the speed of Cocoon caches even
without TUX. After all, the cache would never even have to check the
filesystem. It could just say, "My in-memory bit hasn't been flipped
so I can assume that nothing has changed without any filesystem
calls." Think of the work done so that File.lastModified is called
less often. Wouldn't this simply be extending that thought to its
logical conclusion?
Linux (2.4 kernel and higher), *BSD, Solaris, and IRIX all support
fam. I'm not certain about Solaris, but the others can use fam to
interact with kernel inode monitors without any regular polling. I
seem to remember that Windows has a similar filesystem event callback
system, but that may have been wishful thinking... Anyone know for
sure one way or the other?
Of course I saw the limitations of proposing a TUX-specific option.
After all, how many Cocooners out there have even given a second
glance to TUX -- or even a first? Aside from its use with Apache
through some hacking sessions, this idea could also be used for Squid
and other proxies. Imagine having front-end proxies that simply serve
content until they are explicitly told otherwise instead of having to
constantly make checkup calls.
On the other end, instead of fam, the same might be used for databases
as well. For example, PostgreSQL could have a trigger function
written that, when a dependent set of tables are changed, a cache
expiration queue table could be populated with the affected entries
and a linked C function (in the database) could fire off a "cache
expired" message to the app server. But of course, this requires that
something like Cocoon have a socket listener or constantly open
connection to its data input source.
+----------+---------+ +------------+-----+
| Database | DB data | | Filesystem | fam |
| | minder | | | |
| +----+----+ | +---+-+
+----------+ | +------------+ |
| |
+------------+-----------+
|
+-------------------+---------+------------+
| Cocoon | Cocoon Cache Manager |
| +-----+----------------+
| | | Cache Exporter |
+-------------------+ +----+-----------+
|
|
+------+------+
| |
+-------------+--+ +--+-------------+
| Filesystem | | Network Socket |
+----------------+ +-------+--------+
|
+---------+ +-----+-----+
| TUX | | Squid |
+---------+ +-----------+
Anyway, something like that. I hope my meager ASCII art skills can at
least get across the basic premise. What I think this entails
includes (1) better knowledge of how the Cocoon cache current works
than I have at this moment, (2) a cache event generation interface
(implemented by a filesystem monitoring object that uses fam or
similar technology or a custom database trigger and Cocoon-side socket
listener), (3) a cache exporter interface (implemented by a filesystem
serializer or Squid-cache--aware network component), and a heaping
bowl of programming chutzpah.
Obviously my focus would be on getting TUX to work rather than on
Squid as I use the former and not the latter. What it really comes
down to is that Squid is another box(es) to purchase and maintain and
I am cheap/poor/bored (take your pick). But I would just as soon make
interfaces that weren't intimately tied to the filesystem and my
specific goals (hence the discussion about Squid and databases).
Of course, if Sun added a filesystem callback interface to the I/O
libraries, the native code wouldn't be necessary. ;-) Sure, not
every OS supports it, but then not every platform supports threads
either. When not present, you emulate, right?
Any thoughts? Am I absolutely off my rocker and in need of
medication? Is it something that you are already working on? I've
given my thoughts on advantages of a proactive cache. What are the
drawbacks to a proactive cache rather than a reactive one (other than
simplicity and the fact that the reactive one is already there and
works -- already got that)?
More info on fam: http://oss.sgi.com/projects/fam/faq.html
More info on TUX: http://www.redhat.com/docs/manuals/tux/TUX-2.2-Manual/
Thanks for your time,
Miles
| |