|
Re: Thought on future of XMLC: msg#00120java.enhydra.xmlc
I agree with Mark completely on the subject of valid HTML. There is a standard, it is not that hard to follow. It perhaps left a few things up to the user, in fact, that would better have been spelled out, but it is workable. I do understand that folks 'grew up' without understanding it, but it isn't very hard to get used to doing the valid stuff. A slight problem I have with JTidy is that in a few places it 'has an opinion' where it need not, e.g. the table summary requirement, and that causes more trouble than it should. I have a little 'peccadillo' about XML in fact. In my classes on XML I talk about "valid" documents when there is a grammar to specify validity, but I make it a point not to talk about "well-formed" XML, explaining why: if it ain't "well-formed" then it ain't XML at all, it is something that looks sort of like it. I think this encourages students to realize that there is a standard, not at all complex (except for a few rarely encountered situations). In a similar manner we do not talk about "well-formed" C or C++ or Java, and in fact in the first two cases there would be more justification, since "the rules" are not totally laid down. We all understand "it ain't a C program unless the compiler agrees that it is". A little attention paid to this in the early days of HTML would be paying great dividends now, in helping us to convert the masses of existing "quasi-HTML" documents to something more manageable and manipulable. This whole XMLC discussion is great, by the way, and my thanks to those who are participating. It is very nice to see folks discussing some of the real choices that could be and will have to be made in the future. And special thanks to those patient folk who are actually implementing it. It is a great idea which IMHO still has a lot of life. Although I am not myself an active developer, I always show it to students as an outstanding example of a way to "keep the spaghetti noodles straight". ----- Original Message ----- From: "Mark Diekhans" <markd@xxxxxxxxxxx> To: <xmlc@xxxxxxxxxxx> Sent: Thursday, November 28, 2002 3:08 PM Subject: Re: Xmlc: Thought on future of XMLC > > Hi Arno! > > Arno Schatz <list@xxxxxxxxxxxxx> writes: > > We are setting the bar a bit higher if we require html being corecct to the > > spec. Most html developers can live without the spec until they get to know > > XMLC and JTidy. On my other project, I am working with an external html > > design studio. Even though they are very capable, I again have to explain > > what correct HTML is and what not. > > Is it too much to ask people to know how to use their tools? You are doing > the internet a big favor by forcing them to know how to write correct > HTML. Sorry you have to deal with this attitude, it sounds frustrating. > > It was a major mistake for browsers to ever support invalid html. It's why > their is so much invalid HTML and why it's so hard to make compatibile pages, > develop new web browsers, write tools that analyze and index HTML. If > browsers had simply rejected bad html, there wouldn't be any. > > > Technically XMLC does not need to do that either, if it adopts the concept > > of not changing more of the original html as neccessary. > > Actually, it does. XMLC transforms the document into an object hierarchy; the > original document is gone. The DOM does not represent the formatting of text > file that was inputed, it represents the data in it. > > There is no way to produce the original formatting, even if valid, only the > original meaning. If invalid, it leads to impossible to handle cases. For > instance, I have seen <BR></BR> used. This is not valid html, and doesn't > render the same as <BR>. Yet there is no way for XMLC to represent this in > the DOM. It would always output <BR>. Modifing the DOM to be able to > represent any type of invalid html would be monumental undertaking. > > > However, it insists on generating the HTML from nodes even if they were not > > changed (instead of taking the original HTML from the file), which makes it > > slower and IHMO more difficult to use. > > This seems very hard to implement and would result in something complex and > unpredictable.. XMLC has no way to know what is going to be modified. It > would have to know how to get back to the original parts of the document for > every subtree, however defining the subtrees means parsing the document. The > rendering of parts of the document would be different depending on if it was > coming from the DOM or the document. If this really needed, I think it > indicates that XMLC is fundamentally flawed. > > > One great thing about xmlc is, that it lets the html developer work the way > > they are used to (with static html). This being said, I don't think we > > should force them to validate their HTML, because that is not the way they > > work. I think the tools should work the way most people are used to work > > successfully on projects. (Not adapting the people to a way of working we > > think they should work.) > > Java programmers are expected to develop valid java, why shouldn't HTML > developers write valid HTML? Sorry, I just don't buy this. It's why there is > a HTML standard. HTML is a file format, not free form text. There are a lot > of unemployed web designers, they can be replaced! If the way people want to > work is create something that isn't HTML, only looks like it, well, it isn't > HTML. > > I don't thing we (as techincal people) should be encouraging improper use of > tools, even if it is they way they `like' to work. We should have the > patience to educate people who don't come from a computer background on how > computers sortware works and why doing something that is almost right makes it > an artificial intelligence problem to deal with. > > Programmer's have to make compormises to implement things within the > constraints of the languages they use, html developers might have to > compromise their design a little to be able to do it in html, but that is just > part using software. > > > taking the design goal of xmlc serious, that it only supports valid > > documents, then why not breaking the build if xmlc encouters any warning? > > I actually think it should. At least in the case where it needs to modify the > contents to produce a valid document. This seems to be a big part of the > problem you are having; JTidy tries to correct the document to valid HTML, but > the results are not predictable. > > The way Tidy was designed to be used was it would correct the document, then > one examines it and edits it as needed. However, when using XMLC, one can't > edit the results, one has to go back to the orignal document. Their is a big > delay between creating and compiling. Since they are warnings, people ignore > them (especially when some of the warnings are debatable as being actual > warnings). > > Sorry, I made a big mistake by not figuring out how to turn these in to > errors. IMHO, this would be a good thing to add to XMLC, with an option to > reenable the document correction. > > > Why support the Swing parser? I think these two things are already > > kompromises. > > The swing parser is there for historic reasons; it was the original parser, > since there was no other freely available parser at the time. It probably > should not be used. > > If people really want to use invalid html, XMLC is the wrong tool. One of the > tools based on string substitution is the way to go. However, IMHO, invalid > html is the wrong thing to do no matter what tool they are using. > > > Take care > Mark > _______________________________________________ > XMLC mailing list > XMLC@xxxxxxxxxxx > http://www.enhydra.org/mailman/listinfo.cgi/xmlc
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Thought on future of XMLC, Mark Diekhans |
|---|---|
| Next by Date: | Re: Thought on future of XMLC, Arno Schatz |
| Previous by Thread: | Re: Thought on future of XMLC, Mark Diekhans |
| Next by Thread: | Re[2]: Thought on future of XMLC, Jacob Kjome |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |