|
Re: CML RSS: msg#00040science.chemistry.blue-obelisk
Hi Henry, all, thanks for the comments, it been helpful to see where CMLRSS is coming from. I still think enclosures are needed to take things forward. A quick scenario that hopefully shows why I think enclosures are a much better idea; let's suppose I subscribe to a feed that's 15 items long, where each item includes a 1M CML file. I poll the feed every hour for updates. Let's say that client and server are well written, and use If-Modified-Since so if there are no changes I don't download the feed. Using inline CML, if one item changes, I've got to retrieve the whole feed again, 14MB of which I already have. Using enclosures I retrieve a few KB and then the 1MB file. N.B. that filtering the inline CML doesn't help me here - I still have to get the lot. More discussion inline: - Rzepa, Henry wrote: At 12:09 +0100 24/5/07, Jim Downing wrote: This is a valid point - it wouldn't be as easy to filter enclosures (although it's not much harder), and I take Daniel's point that filtering needs to be done on the data.... but... do we really need megabytes of data in the stream so we can filter on molecular formula or a connection table? Couldn't the CMLRSS have an extract inline and then an enclosure link to the full data? Another intent of CMLRSS is that it would in fact This is a strawman: - the concept of streaming an entire database is not required for enclosures to be a useful development. We (Nick Day and myself) are having performance problems now, using CMLRSS to do what it was designed to do. The CMLRSS Nick generates from a single edition of Acta E is somewhere between 20MB and 40MB. This occupies over 64MB of RAM, using a DOM approach - Nick has had to resort to using STAX to generate the file. I know this doesn't sound like much, but in a server environment you don't need too many requests like that in parallel to cause difficulties. Yes these files contain more than 15 molecules, because each issue of Acta E contains more than 15 molecules. I suppose we could "save up" the updates and let them out every other day in batches of 15, but I feel that's a pound of complexity in implementation for the sake of a penny of simplicity in the protocol. As we all know, CML files can get a lot bigger than the ones Nick's generating. Granted, if its used to provide a feed for 1000s of molecules or a smallerI'm no expert, but the podcasts I've seen have all been delivered as enclosures. Speaking as a part time sysadmin I'd prefer the load an enclosure based approach generated (content spread over many short-lived connections) than that of an inline approach (much content over a single long-lived connection). RSS 1.0 which was used for CMLRSS does not in fact support the conceptAtom specifies that any foreign markup must be tolerated, so RDF/XML should be fine. Daniel Zaharevitz wrote: <snip/>I also echo Henry's misgivings about moving the actual chemical information out of the main feed. The most important summary IS the chemical structure (in the way I think chemists would be interested in using it) and while there is no big problem with having a display of structures with a "click here for additional data", I think it would be pretty useless to have a list of "new compound", "new compound", etc. with a "click here to see structure". <snip/>It's not a case of clicks - a CMLRSS client would have to know what to do with the enclosure just the same as it currently has to know what to do with the inline data. Programmatically the differences aren't great. Best regards, jim |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | BO on Second Life?: 00040, Jean-Claude Bradley |
|---|---|
| Next by Date: | Re: Relocation of mailing list to Google Groups + rant: 00040, Egon Willighagen |
| Previous by Thread: | Re: CML RSSi: 00040, Rzepa, Henry |
| Next by Thread: | BO on Second Life?: 00040, Jean-Claude Bradley |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |