"David Carter" <david@xxxxxxxxxx> writes:
> My understanding had always been that content-encoding (when talking about
> compression) is in practical terms no different than transfer-encoding. LWP
> already handles transfer-encoding (gzip or deflate), so what's the big deal
> about it also handling content-encoding compression in a transparent manner?
Tranfer-Encoding and Content-Encoding works at different levels in the
HTTP protocol. It makes perfect sense to handle Transfer-Encoding
transparantly in a client library. It does not make sense to try to
hide Content-Encoding in the same way.
> My suggestion would be to make it the default to handle it transparently,
> but provide an option to turn it off if someone needs access to the raw
> datastream. All GUI browsers "just do it" - the user doesn't have to be
> concerned with either content-encoding or transfer-encoding.
I disagree. LWP is not a GUI browser and should not hide
content-encoding by default.
> If you have a file in .tar.gz format, the web server should NOT return a
> content-encoding: gzip header.
Sure it should. Especially if the Content-Type header describe the
type of document you end up with after you 'gunzip' it.
> This would incur redundant processing costs
> on the server & the client, attempting to re-compress an already compressed
> file for little or no gain. Instead, the server would send an appropriate
> mime type indicating to the client that this is a compressed archive file
> (usually handled in a GUI client by presenting a file download dialog box).
I disagree here, but I'm sure practice differ among servers. Apache
seems to serve .tar.gz files as:
Content-Type: application/x-tar
Content-Encoding: x-gzip
and I think that is exactly as is should be.
> It may not be what the RFCs originally intended, but modern web server
> implementations of on-the-fly compression in my experience always use
> content-encoding rather than transfer-encoding.
Could it have something to do with what MSIE implements?
> I've written a server-side
> plug-in to do this on the Netscape/iPlanet web server, and have done fairly
> extensive research on what's out there in Apache, etc.
I'm not opposed to adding stuff to LWP that let you undo
Content-Encoding, but it needs to be enabled explictly to make it
backwards compatible.
LWP currently has code that tries to parse the head section of
text/html documents to extract headers, meta and the base. This code
fails when the document is compressed, so there is actually need for
undo-content-encoding-support in the LWP core.
I think most users would be served well with an option that simply
tells LWP to try to undo content-encoding for any text/* content, but
I'm also thinking that LWP should have some kind of generic filtering
mechanism similar to Perl's IO layers. That should be able to deal
with content-encoding and might even turn the content into Unicode
strings and similar based on the charset parameter.
Regards,
Gisle
|
Try Searching:
servers, voip, java, networking, microsoft ...
|
|
|
|