logo       

Re: RFC HTTP::Cache module: msg#00042

lang.perl.modules.lwp

Subject: Re: RFC HTTP::Cache module

Thank you for your comments! I have responded inline.

On mån, 2004-09-27 at 18:58, Ofer Nave wrote:
> 1) It sucks having to re-implement a subset of LWP::UserAgent parameters
> in your module (like UserAgent). Even if you're simply passing them
> along verbatim to the UserAgent constructor, you still have to provide
> some documentation in your module, and you can't possibly cover all of
> the params. You could simply say that params get passed through to
> LWP::UserAgent, I suppose.

Hmm, what about changing the UserAgent option to take an actual
LWP::UserAgent object instead. That gives complete flexibility with no
code duplication.

>
> 2) If you try to over-simplify the process, you eliminate the option of
> using all less-simple-than-simply-calling-get() functionality in the
> libwww module. Eventually people will want to be able to cache posts,
> or check the http status code of the response, and other such things,
> and you will be busy re-implementing everything that's already implemented.
>

In general, I think that only "simple" get-requests are possible to
cache. Post-requests, anything involving cookies etc. can most of the
time not be cached since the response is generated dynamically and the
server does not implement proper cache-control for these responses and
instead just says that the response is new every time.

The interface (i.e. "get") is the same as that provided by LWP::Simple,
but with the added bonus that you get access to any error-codes returned
by the server if you want to have it. I think this covers a majority of
the use cases. If anyone needs it, I can always add a more versatile
interface later that allows you to do more things (perhaps with complete
HTTP::Request and Response objects), but this kind of interface will
probably be too complicated in many cases.

> How about instead of providing "get" methods and returning "content"
> directly, you integrate properly into the libwww module and cache/return
> HTTP::Response objects? You can still key on the url (ignoring the
> parameters, unlike Apache::DBI), although POST content might need to be
> part of the cache key.
>
> Perhaps you could make HTTP::Cache one of those "magic" modules that if
> you simply "use" it, or load it and set a global variable, caching
> starts happening automagically (in the background you could override a
> few pieces of libwww to insert the caching in the appropriate place -
> should be fairly seamless).

The HTTP::Cache module is currently roughly 110 lines of code (not
counting documentation and blank lines). Integrating it into libwww
seems like a lot more work to me since I'm not familiar with the inner
workings of LWP.

>
> 3) Some global cache configuration options would be nice (instead of
> per-request). You could look at squid as a model (squid being the
> premiere open source web caching application), but off the top of my head:
>
> a) set a max-live time (global, or per mime-type, or per domain.... you
> can get as fancy as you dream)
> b) turn on/off depending on verb (like GET, POST) or if query-string
> params detected
> c) set default "expires" time if the web server doesn't offer one
> d) whether or not to even bother trying to HEAD the url or just go
> straight for the goods
> e) yes, a user-agent string
>

All configuration is per HTTP::Cache object. This object can be used to
perform several get requests, so the configuration is not per request.

Currently, all requests are always checked against the http-server, so
there is actually nothing to configure regarding expiry-times etc. If
the server thinks that the cached copy is up-to-date, we will use the
cache. What happens is that i send a normal http-request but include the
headers ETag and If-Modified-Since. If the server thinks that the ETag
is correct and/or the content has not been modified since the date
provided, It will return a response code saying that the cache is
up-to-date. Otherwise, it will return the complete response as normal.
So there is no HEAD request involved at all.

/Mattias





<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise