logo       

Re: Why MD5 Headers are Imperative: msg#00415

network.syndication.atom.protocol

Subject: Re: Why MD5 Headers are Imperative



>-----Original Message-----
>From: Julian Reschke [mailto:julian.reschke-Mmb7MZpHnFY@xxxxxxxxxxxxxxxx]
>Sent: Thursday, August 17, 2006 12:53 AM
>To: eric-MkmoNbc1SAncr/OS1auqaA@xxxxxxxxxxxxxxxx
>Cc: atom-protocol-O6w3ZxSwtmQ@xxxxxxxxxxxxxxxx
>Subject: Re: Why MD5 Headers are Imperative
>
>Eric J. Bowman schrieb:
>>> Well, no matter what APP says, HTTP GET and PUT *always* transfer
>>> representations. That's by design in HTTP, and APP can't change that.
>>> That being said, I really don't understand how adding the sentence above
>>> changes anything.
>>>
>>
>> It doesn't change anything. What it does is acknowledge that the readable
>> and the writable representations of the resource are not, in Atom format,
>> necessarily identical.
>
>Well, I think everything will be much simpler if they are. It seems to
>me that you're trying to optimize for a use case that doesn't need
>optimization.
>
>>>> Because WebDAV correctly assumes this determination is up to the editor.
>>>> Since this is a perfectly reasonable fallback means of updating an Atom
>>>> Entry
>>>> Document when it is desirable to give this control to the editor, all the
>>>> more reason to make sure that APP supports Atom-specific manipulations.
>>> Hu?
>>>
>>> The ETag always is assigned by the server. There's no way in plain HTTP
>>> or WebDAV to update a resource via PUT and let the editor claim that
>>> it's an insignificant change.
>>>
>>
>> An editor can use WebDAV to PUT a new resource at a location, including a
>> new
>> <updated> value, regardless of what any publishing system wants. You're
>> getting my point backwards, btw. I'm saying exactly what you just said --
>> there is indeed no way in plain ol' HTTP to let the editor claim it's an
>> insignificant change -- _all_ edits MUST count as significant. Atom Format
>> says _no_ edits MUST count as significant.
>
>Yes. So ETag (as HTTP cache validator) and app:updated (as something
>that can signal whether a feed reader should "redisplay" the entry) are
>very different things. This seem to be perfectly ok.
>
>>> If the server decides that the new content isn't really different (for
>>> instance because whitespace between XML attributes was changed, but the
>>> server stores the data in an XML-specific store anyway), it has the
>>> choice not to assign a new ETag.
>>>
>>
>> That's exactly right. The editor has no way of knowing if his most recent
>> change resulted in a new cache-control setting. So how to confirm the edit
>> on retrieval? By forcing a revalidation. Unless the server hasn't changed
>
>Why would it need to know? You do a PUT, the server says 200, so the
>content has been stored. If you're lucky, you also get a new ETag in the
>response.
>

Maybe I'm misreading the spec, but I thought it said after a PUT the response
should include the Content-Location of the Member URI.

If an editor is changing something the server deems insignificant, then that
change is not transformed to the still-current-and-unchanged-by-the-edit
read-only representation. Since the intent was to alter the _writable_
representation which is separately cached with my solution, the client may
dereference the URL in Content-Location _with_ an "X-APP-MD5" header which
will retrieve the updated writable representation.

>> its cache-control settings, because the server owner/publisher has deemed it
>> an insignificant change. In which case what does the user do with what
>> appears to be a lost edit, when in fact it was accepted with the appropriate
>> 200 OK response but whose message body contains the same cached version the
>> editor started with?
>
>Why would it contain that? I'm not sure why that would happen.
>

Because the server never transforms the insignificant edit into the static,
read-only version whose cacheing is based on etag=atom:id+atom:updated, never
updates that eTag, and will cause even a revalidation request to return 304.

>>>> Here's the example: I'm generating summaries elsewhere on my site using
>>>> the
>>>> placement of the <p/> tag (just for the sake of the example, although re-
>>>> serializing the XML is another example of an insignificant change) but
>>>> this
>>>> has absolutely no effect on any request for the full-content
>>>> representation.
>>> Clarifying: you mean for those who read a transformed HTML version?
>>>
>>
>> Not just them, what about everyone who subscribes to the feed whose content
>> and updated values haven't changed but whose eTag (mysteriously to them)
>> has?
>> Only the writable GET representation of the Member resource needs to be
>> changed in the case of insignificant edits, not the otherwise-perfectly-
>> cacheable read-only representation.
>
>Well, you will have clients refetch the contents although this wouldn't
>have been necessary. It will not affect the user experience, except for
>additional traffic/delay. How frequently are you doing these kinds of
>changes requiring the protocol to come up with a custom "fix" for that?
>

"Except for additional traffic and delay" for one user turns into a real
scalability nightmare on the server. What I'm saying is that this problem
will be so frequent that the spec is addressing the 20, not the 80, of 80/20.
Because the least-frequent GET request will be coming from an APP client, why
should the least-likely use case invalidate the cache strategy which was
designed on the premise of the most-likely use case of read-only?

>>> It will be considered a content change on the HTTP level, that is the
>>> content is refetched; but on the Feed Reading level, it shouldn't show
>>> up unless atom:updated indeed changed as well. That seems to be
>>> completely ok to me.
>>>
>>
>> At the <feed> level everything's AOK. But, when the feed reader fetches
>> that
>> standalone <entry> it's going by the cache-control headers. So why should
>> an
>> insignificant edit that this user won't even see reflected in his
>> representation be forced to round-trip my server when what he gets is no
>> different that what the last user got? See my scaling problem here? Most
>> web visits aren't by editors, they're by readers, so you can't invalidate
>> the
>> reader cache for insignificant edits without breaking any Atom
>> implementation
>> which uses <updated> as the basis for cacheing.
>
>I think I now do understand what's going on, but I'm absolutely not
>convinced that it's a problem in practice.
>
>>> I do agree that id+updated doesn't make a very good ETag, so why don't
>>> you assign something better?
>>>
>>
>> What I'm saying is that id+updated makes a _perfectly_ good eTag, if I
>> wasn't
>> trying to implement APP alongside it what would be wrong with it? Please
>> don't make Atom Format implementers break their cache strategy in order to
>> support the Atom Publishing Protocol.
>
>HTTP defines how an ETag needs to behave. Well, at least for strong ETags.
>
>If you use id+updated as a strong ETag, and do non-significant edits on
>the resource, causing the ETag to stay the same, then you're not
>compliant to HTTP. Thus, id+updated can't be used as ETag unless you
>change "updated" with every change, which would defeat it's original
>purpose.
>

Check your assumptions, please. :) If I set up a read-only Atom
implementation then I'm not doing _any_ edits, am I? Therefore the logical,
intuitive and obvious cacheing solution for this use case is eTag=atom:id+
atom:updated, in perfect compliance with HTTP 1.1.

All you're saying is if I want to comply with APP I have to de-comply with
HTTP, or re-code my entire application based on the assumption that <updated>
means something different than what RFC 4287 says it means.

-EJB

>
>Best regards, Julian
>
>
>





<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise